SEP 15 192) | 


surnal of 
Educational 
Research 


OFFICIAL ORGAN OF THE NATIONAL ASSOCIATION 
¢ OF DIRECTORS OF EDUCATIONAL RESEARCH? 


B.R.BUCKINGHAM, EDITOR 
ASSOCIATE EDITORS 


EJASHBAUGH W.WCHARTERS GEORGE DSTRAYER 
GUY MWHIPPLE =S.ACOURTIS LEWIS M.TERMAN 
WALTER S.MONROE 


VOL. IV SEPTEMBER, 1921 No. 2 
CONTENTS 
MEASUREMENT OF TEACHING EFFICIENCY.....J. B. Sears 81 
UNIVERSITY........ Dean H. E. Hawkes and Dr. A. L. Jones 95 
METHODS OF BQUALISING THE RATING OF ——: 
COOPERATIVE CHEMISTRY TESTS...............+ Seth Hayes 109 


NEWS ITEMS AND COMMUNICATIONS................4. 151 
NATIONAL ASSOCIATION OF DIRECTORS OF EDUCA- 
159 


PUBLISHED FOR THE BUREAU OF EDUCATION- 
ALRESEARCH « COLLEGE OF EDUCATION 
@UNIVERSITY OF ILLINOIS « 


RELIABILITY OF BINET SCALE AND PEDAGOGICAL 
Otis and H. EB. Knollin 121 


Journal of Educational Resears| 


Published for the Bureau of Educational Research, University of Dlinois 
Editor-in-Chief: B. R. Buckingham 
Director of Bureau of Educational Research, University of Illinois 


After September first, Ohio State University, Columbus, Ohio. 


Associate Editors ARTHUR W. KALLOM, 
J. ASHBAUGH. of Research Asso- ssistant Director Department 
ciation. University of Iowa, lows 
w. Dean, School of Bducation and Director Burws 
nat, Tech., Pittsburgh. School Service, University of Kansas. 
MONROE. Superintendent of Sohools, Lor Apia, 
roana, WILLIAM A. MeCALL, 
GEORGE D. STRAYER, 
GEORGE W. MELCH 
of Tests ond Measurements, Uni- SUGENE A 
ector, Bureaw ‘es 
versity af Michigan, Ann Arbor, Michigan. 
JOSEPH P. O'HERN, 
Contributing Editors Aasistant Superintendent tet 
Schools, d and Berkeley, California. ) Public Schools, Omaha, 
Edweational M rasuremenis, Schools, Clevela 
WILLIAM S. GRAY, CLIFFORD 
University of Chicago. Professor of Education, University of Washington. 
Dean of th College of Education, University of Minne- Réditor of “Reviews ond Abstrects” 
V.AC. HENMON, CAMERON, 
Director School of Education, University of Wisconsin. of Urbana, 


NAL O TIONAL RESEARCH 450 
in 00.00 in ether foreign 


The Public Sehoo! Publishing Company conducts the subseription work. Send all orders to them at Blooming, 


The JOURNAL OF EDUCATIONAL RESEARCH is both needed by and needs the subscriptions of many mx 
teachers and sehoo! officers 


Articles to Appear Soon 


A Cycle—Ominibus Intelligence Tests for College Students by L. L. Thurston. 

Analysis of Reading Ability by S. A. Courtis. 

Measuring Progress by Means of Standardized Tests by S. S. Brooks. 

Arithmetic Ability of Men in the Army and Children in the Public Schools by Arthur 
Kolstad. 

Correlation: How to Work It on the Adding Machine by Herbert A. Toops. 

Rate of Progress in Teacher Preparation by W. Randolph Burgess. 

A Series of Standardized Diagnostic Tests for the Fundamentals of Algebra by £. 2. 
Douglas. 


| Copyright 1921, by ig 

ys Public School Publishing Co. i 
| 

| 

| 

| 

| 


JOURNAL ef EDUCATIONAL 
RESEARCH 


[\ SEPTEMBER, 1921 RZ 


rHE MEASUREMENT OF TEACHING EFFICIENCY 


THE PROBLEM 


The measurement movement.— During the past decade we have 
busily engaged in the development of a new terminology 
lucation a quantitaulive terminology e nave een try- 
» measure the different machinery, processes, and products 


the schools, and by these measurements not only to standardize, 
it also to rationalize every step in our procedure. Pedagogical 
nd mental tests and scales, building scales, units of cost, hygieni 
standards, age-grade-progress norms, teacher rating systems, 
and standardized college entrance examinations, readily suggest 
to our minds the many aspects of education to which we have 
ipplied this quantitative language. 

The whole measurement enterprise in education is very young, 
and so there can be no discouragement to thoughtful people, 
either in the fact that most of our standards are crude, and as yet 
only partially reliable, or in the further fact that much of the 
field is still unexplored. On the other hand there is substantial 
hope in the fact that the new methods and standards are being 
put to use in the schools as rapidly as they are worked out, and 
that everywhere the results of their use speak in unmistakable 
terms of their practical contribution to education 

Our subject here has to do with the measurement of teaching 
efficiency, and represents one of the many problems which to- 
gether have made up the important movement which we have 
briefly suggested above. Judged, either by the demand for a 
purely objective scale of measure, or by the extent to which the 
“general impression” method of rating teachers still dominates in 

' Read before a meeting of the department of secondary education of the Minne 
sota State Teachers Association at St. Paul, November 4, 1920 


81 


| 
| 
: 

J. B. Seat 
ton, Mam nd Stanford Junior 
Colle, hy 
cy, Publie 
earch ong 
rated 
ducational 
A, Public 
on 
ingtea, 
more 
her 

R 


82 IOLRNAL EDUCATION AL RESEARCH | 


practice, it is fair to say that this problem has not yet 
satisfactory solution. Judged, however, by the scienti 
that has been done on the subject, and reported in the bibliog; 
of 55 titles here appended, one is inclined to Say that the so] 
isnot far distant. Even 4 Cursory reading of this literature 
clearly three things: first, that practical school men are persi 
in their demand for such a device: second, that the meth 
studying the subject have become increasingly scienti 
third, that the time is now ripe for a careful experimental s{ 
which should result in the formulation of a rating schem: 
formally adopted and used by the teachers of the country 

Practical aspect the problem Just what is this pre 
From the standpoint of school practice it may be briefly 
as follows: In teaching as in everything else, the qualit 
work varies all the way from excellent to very poor, or 
vicious. In our nearly three centuries of experience in edu 
we have finally come toa fairly general acceptance of the prir 
that the school should reward exc ellent and penalize poor servi 
his can only be done satisfac torily when we find a means of s} 
ing in exact terms the degree of success attained in a 
instance. We have tried in numerous cities to base promoti 
on merit and almost invariably the attempt has finally bi 
upon the rock of “what is merit?” General impression and 1 
opinion have done their best and failed, and we are able to re 
nize their failure. If we are to establish the principle that 
counts in our profession, then. what we need is a satisfact 
means for measuring merit 

It is not from this angle alone, however, that we are beginniy 
to recognize an insistent demand for a solution to this probler 
The work of training teachers is becoming increasingly important 
and as the process becomes more scientific the need for a mean: 
of measuring the growth of the teacher in training becomes urgent 

What a saving in energy would be effected, what financial 
waste would be checked, what an amount of justice would 
established, and what a professional stimulus would result. ii 


had tests or instruments of measure by means of which we cou 
predict the success of an applicant for teacher training work or for 
a teaching position, measure the rate of progress of the teacher 
in training, and evaluate the work of teachers in service 


a 
x 
| 
j 


WEASUREMENT OF TEACHING EFFICIENC) 43 


\n equally genuine need exists, too, for scientific instruments 
ids that will aid in the diagnosis of the teaching and 
gray pervisory processes. It is not merely standards that we need, 
| more intimate and exact knowledge of the processes 
cts to be standardized Supervisors of teachers in 
sist ; well as supervisors of teachers at work, are beginning 
this need and the reason they recognize it is because 


re attempting to rationalize the weakest process in educa 
that of supervision. To measure, then, in order that 
define and recognize merit; and to diagnose, in order 
e may rationalize and perfect our processes of teaching 
rvising, these are the educational needs which reveal to 
practical aspects Ol our probiem 


Theoretical aspects of the problem.—On the technical or theoret 


e\ ide the problem is that of defining “teaching success.”’ This 
for an analysis of the teaching process, a process which we 
ywnize to be very complex. The teacher instructs, manages, 
disciplines children, within the limitations of a certain en- 
ent over which she has at least partial control. There 
¢] re certain general and many very specific educational objectives, 
terms of which she must carry on her work. The amount of 
ess is determined by the extent to which and the degree 
f economy with which right teaching objectives are attained 
Shall we try to define “success” in terms of the teacher’s 
qualities and virtues, or in terms of definable results which 
e produces in the classroom with children, or in terms of both? 
test by whit h we hope to predict the success of a tear her, 

: r of a student teacher, must obviously deal with the personal 
professional qualities of the teacher herself, since as yet there 
no results to measure. And, since teachers are not generally 
control of the same class for two years in succession it would 

cem practically necessary in all cases to have some measure of ¥ 


the teacher as well as of results. 

What results shall we measure, and what of the qualities 
nd traits of the teacher shall we measure? In what proportion 
should the two stand in our final success score? In the measure- 
ient of personal traits is our final score to be the sum of equal 
mounts of a number of traits, or is a complicated system of 
weighting necessary to express the true total value of all the 


MOC RNAL EDUCATIONAL RESEARCH | 4 
ved \dd to these questions the necessity for 
reliability of our final score once we have . 
ind we have before us the te hnical aspect of ou ° 
Hy TEACHER RATING ScHy 
e exact methods of defining 
evidenced in the general lite iturs 
| the discussix 
1] dates | 
Uh r has found no study that 
ct that was made before 1905 i 
eme appeared before that of | 
: mean that city superintendent 
) ¢ their appointments and promotions y 
i rr that they had not attempted to ay 
‘ Chey had been doing both of these for 
eri | been n erely estimated in terms of ia 
wit) no attempt at scientiiic measurement 
f hing success In 1905, W. F. Book 
KE. | sought to in juire into the element 
ish-school teachers by making a study of 
ish-school pupils. This indirect method of app: 
lightly different angles by Littler in 1914 
ol ele lentary S¢ hool teachers: by Mos: 
iclied thi failure of high hool tear 
n 19] Who studi | causes of failures ar 
ie Various sizes: by Anderson in 1917. who 
ts on the relative importance of 15 diff 
y Colvin in 1918, who studied the most « 
nnin teachers in high S¢ hool The question 
erizes ost of these studies. each of which 
WwW some light on the lactors essential to 


suc 
lhe statistical] treatment of much of the dat 


ese studies was good, and the results are of s 


d to confirm our previous general i: 
ire the weak points in the tea hing process 


tition lootnotes and at the Same ti 


ill such titles re 


— | 
items in 
measure 
lished it 
efhicier 
mer 
: ears 
| [ndire 
In 1907 il 
ops 
va ‘ 
teache in 
ected jud 
| 
iif | i 
Inte led Lo 
| al 
in nl 
value in that t} 
In order t 
| ‘ 
3 


ss their findings in term 


I] these studies the attempt is made to sh 


n 191 


in in 


{SUREMENT OF TEACHING EFFICII VC} 


1g those engaged in training teachers; 
ive heir app! vach, and at best t] 


the real factors in teaching 


\ second group ot stJ dies has 
e subject Ife © pont Ol ratnel 
} 1 nts of school 


a tra 1910. and was followed by those ol 
» and 1915, Clapp in 1915, Anderson in 1917, Land 
ind Moody in 1918 and Fordyce and [wiss 


ceneral merit Diflerent 


ividual factors in success t 


1 by diflerent studies but all sj 


| methods are used 
Ol correlations 
haracter these In some cast the 
which these studies are based were 1 de in answe 
estionnaire, in others they had been recorded in the to 
erades which in most Ca-es ire littl ditferent trom 
ids ients An Lu has been made Lo 
erit’”’ or “‘teachi success In doing this each writer 
‘trarily chosen such terms as h« believed v ld ex 
ognizable qualities of the teacher, or cle ly recogni 
tors In teat hing ett it ncy In these ter tl eis varia 
th as to number and name, as \ ell as in the itter ol 
ine them into main and subordinate divisions Some 
rms of COTTt lati ns only, others con- 


heir correlations into scores after the fashion of Elliott’s 


tt 

tical score card 

If the results of these studies do not prove con lusively that 
hing ability can be analyzed and expressed in objectiy 


se 


d out Vv 


e 


teacher’s “personality, 
and further, what is the relationship 


ic 


items 


ms they strongly suggest that it can be. They have attempt 


il ynen we relic 


vhether we all mean the Same Ul 


“power to disciphne, 


the burden of t 


and “general success.” And 


f 
petween eacil Ol 


QS 
i” 
| | 
to our knowledge of 
lj i r from the above eroup not only in the point Ol 
¢ 47 the tact that they oller more thorough statistical 
Che tirst of these studies was made 
| 
i i igel 
e 
in 191/, 
“ 
w the relation 
Xpr 
te 
to 
these 


OURNAL ALIONAL RESEA Ri Val 


evidence is Strongly affirmative. The contribution of +) 
studies is, therefore. a contribution in method and techni 
the one side, and in actual analysis of teaching success 
other. And even if there is stil] some disagreement in results 
much overlapping in terms used; even if the devices pI! 
are still vague, unreliable, and too cumbersome for gener 
surely this brief analysis of the work thus far done must co; 
nyone that a substantia] foundation has been laid 


Other t Pes | ludies There are two other Studies o 
what different type that must be included in any review 
scientific work done on teacher measurement. These ar 
studies of Jon ; and Rugg In 1917 Jones conceived the 
ot trying to describe the results of a teacher’s work by 


the mental behavior of her pupils. He chose two ele 
classes in educational psychology which had been taught | 
teachers, one of whom emphasized memory, and the other r: 
Two tests, one useful in measuring memory, the other re 
were given to these classes. The experiment is too smal 
too inadequate in other ways to be at all conclusive, but 
results are suggestive, since the points of view of these 
teachers are clearly reflected in the results of the tests. R 
, has devised a measuring device patterned after the army 1 
scale, in which merit js expressed, not in absolute quantity 
“in terms of rank order. A teacher is rated on each iter 
comparison with a group of five other teachers who hav 
chosen as illustrating five degrees of eftic iency ranging from \; 
, Poor to very good. This plan of measurement has proved pr 
+ tical in the army and, within certain limitations, should wo 
education 
Characteristics Of scales now available. Of the rating s 
now available for use some are merely off-hand analyses 
general merit, while others represent a more careful analy: 
together with an attempt at a defensible we ighting of the differ 
factors involved in general merit. Some have the score express 
numerically, others, by comparison, or by relative position or 
scale of values. Of the latter, some provide for three possi 
ratings of each item (as poor, medium, good) while others re 


nize as many as ten divisions of merit. Some refer to the differen! 
items of merit by name, others by a question, and in either ca: 


‘a | ’ 
4 

| 

| 

| 
~ 


REMENT TEACHING EFFICIENC) 


Si 


given. Finally 
supervision 


nition of the item may or may not be 
ful large ‘ly as devices for inspec tion an 
as self-rating plans 


yecially 


? Next STEP IN TEACHER MEASUREMENT 
hlem of teacher measurement presents itself to us in 
aspects, the prac tical and the theoretical In the 
ee must clarify in our minds the functions that are to be 
practical criteria to which the make up ot such a 
conform, and the how and where of its use In the 
vork out an analysis of ‘‘teaching success so that 
e to define the elements essenU 1 to such success 
‘ l he abl to define the relation hip that exist between 
rs or elements in volved, to ether wit the reliability ol 

signed by users of the dev: 

hy measurement.—With this general 


ws to be served O\ 

1en, let us 


accomplishments before us, then 
. a definite point of departure for a further study of this 
In doing this our first step is to set ryt oul 


ttempt to 


objectiv es 


bv a measurement oO! 


ng 


lear iust what we wish to serve 


cy. 


etiicien 


The se 


there ought to be 


may be stated as follows 


a sorting process at the ent ring door 
ol 


by yuality 


teacher-training institutions. Some people 

e are foredoomed to failure in tea hing, re na ss of train 

d good intentions, and it is a source of waste and disap 

nt when such people are trained tor teaching. A test 

ened to measure the amounts 0! the particular tr its that 

r for success might decide, in a tew n inutes, ques 


responsible 
we have be ly after several 
h we have been answering only alter several \ 

ro find those traits, define them, and measure 


ears Ol 


them 


we need a means for measuring the progress which 


Such a device would be valuable, 
acquired the 


Second, 
being made in training. 

saying when a ep has 
to enter the profession, 


nol 


a means of 
juisite Pestece of knowledge and skill 


s a basis for directing that tr: ining 


as 


a 
rhird, we need a test by means of which we may be able to 
predict the degree of success that is likely t o be attained by 


an 


| 


OURNAL EDUCATION AL RESEARCH | 


applicant for teaching position. <A test that will. 
measure, r the long and numerous conferences wit 
inte n le ( CO I it ces, ete 


la 
Sixt} ‘ 
« me 
» Processes 
te is 


eaching situation by utilizing the t 


we need to measure the efficien: V ot our te 
it {) We may supervise more int 


inerate in terms of m rit, and ( 
nd mi l] | 


inderstanding with a rational basis for 


need to hav teachers measure the msel 


Ves 


i 


professional stimulus they will derive 
sis of th ir own work 


O measure teac hing elhciency tor the }j 


t will throw upon the teaching and SU] 

rhe writer regards this function of teacher-e 

listinct and important function, and not n 

in, or by product of, the lunctions stat 


tests have already shown how much cl; 


neasurement purposes; and there 


te ChEVE that teaching tests n ay help materially toward rati 
izing t] teacher-training. the teaching, and the supe 
pre 

rile or rR MLA lig thr needed de ices It is hardly 
ible t] ve sh ible to devise any one test or plan of } 


necessary at 
DENAVIO! 


In addition 
in 


process 


Any test « 


fmeet the fol] 


De as nearly « 


there n 


must be proper] 


to use; and fin 
broble 


the above end 


rve all thes purposes To 1 ‘et 


Whlle any 


i native endowment, since any glaring lack j 
juired traits would be obvious, elther in the 
the applicant or in records of previous trai 
if cl fects could still be Overcome during il 


» this our other aims call for a measure of the ¢ 


lesigned for use in accomplishing these ends 


owing requirements: (a) the me; 


Hjective as possible: (b they must be anal 


be no overlapping between factors; (d) the { 


i 


Weighted: (¢ the plan must not be « umbers 


ally, (f) its validiiy must be established 


or analy 


veneral leachi; SHCCESS. lo a 


$ within these limitations, we have next to proc 


4 
Fourt] 
“service, in 
b) promot 
tion 
an incident 
added to at ts for dia 
as well as f 
ti 


EASUREMENT OF TEACHING EFFICIENCS 89 


- ctudv of the teaching process and of the qualities ot! 
elves. We can hardly hope to be able to speak 
ualities of human nature that 


rate traits or q 


teaching success, nor of the separate functions 

her performs, with the same definiteness as that to 
-e accustomed in speaking of the organs of the body, 
nachine, or ol quantitie s of size or weight. Yet, 


ove reviewed argue anythin: they argue clearly 


ertain of these tactors in teaching success 


niteness to be clearl understood by others 
-. therefore, that of applying statistical methods to 
eneral merit.”” We must not contuse neral 
ilities, and terms used to describe or designate 
tions must be mutually exclusive. For instance, we 


except roughly, as shown by Boyce, Ruediger and 
nd others what part ‘“cveneral intellige nce” plays 
Hless affects any value we might assign to “personality,” 
ional interest,” “ability to discipline,” etc. One of our 
present achievement 


then. if we are to go beyond the ] 
r rating, is to find the significance of general intelligence 
ral efficiency” and for all the spe« ial factors in “general 


With mental tests available, this step is now only a 
work. There are doubtless other factors that do not 
independently, and the value of which depends upon their 
tion with still other factors. lo unravel this tangle ot 
ind general factors is our task. 
Possible theoretical relationships between factors in “general 
We may consider the factors which enter into general 
‘ng success according as they are indepe ndent or dependent, 
or unequal,’ constant or not constant. From this con- 
eration there arise eight possible alternative conditions. Teach- 
success may consist of factors which are: 
1. Independent, equal, and constant. 


lerable amount of carefull; collected data in the offices of the depar 
research of the Oakland ol system (Virgil E. Dickson, director) ine tes 
is a rather | correlation between “gene! | ellig r nd ‘te ng 
It is hoped that more exact methods of measuring “teaching success li 
ble in order that such data as these ma) t r tribution to this 


e., equally or unequally potent in contributing to success 


ct 


4 OL RN EDI CATION {7 RESE 1 RY H | 


2. Independent and constant. but not equal 

3. Independent and equal, but not constant 

+. Constant and equal, but not independent 

5. Not equal, not independent, and not constant 

6. Independent, but not equal and not constant 

/. Equal, but not independent and not constant 

8. Constant, but not equal and not independent 

If condition No, 1 obtains, we have merely to find o 
are the factors, and how much. or how many units of , 
present, and then add them together to get “total t 
success.” For instance if we have five possible amounts o| 
ity to discipline,” and a given teacher is said to possess 
such amounts, then her “general merit” would be raised 
points or units by virtue of her possessing this amount of 
to discipline 

If condition No, 2 obtains, it means that the different 
contribute varying amounts to “total success.” One ynj; 
“ability to discipline” might, for instance. contribute twic; 
much as one unit of “initiative and self-reliance,” in which 
the number of units of “ability to disé ipline” possessed 
teacher would have to be weighted by doubling it. After 
items had all been weighted, then their sum would ind 
‘total! teac hing success.”’ 

If condition No. 3 obtains, then that means that amount 
t given item would not contribute to “general success” in pi 
tion to the number of units of such items possessed. One unit 
“skill in discipline” added to “no skill in discipline” would |; 
not mean anything like as much as one unit added to three unit 
In other words, up to a certain limit each unit added would 
more to “general success” than any previous unit added. |; 
may be true, indeed it almost certainly is true, that not one | 11 
many of the factors in teaching success vary in this or the opposit 
way, or in both ways, in which Case a special system of weighting 
would be necessary lor each item before adding to get “‘tot 


general success.”’ 

If condition No. 4 obtains, then we should have to disco: 
certain correlations that might seem to exist between “gene: 
success’ and each of several separate factors. because a certai! 
group of the separate factors are not independent. They contai 


— 
: 
| 
5 


WEASUREMENT OF TEACHING EFFICIENC) 91 


or more elements in common. It is almost certain that 


the correlations reported in the studies above reviewed 
e explained in this way. General intelligence may bi 
significant for each of all the separate factors involved 


il teaching success.’ If so, we could deal with the 


ne 


factors as if they were independent. If it does not 
them all equally, then here again we have the problem 
| ing out a correct system of weighting, to overcome this 
ea indept ndence, e.g., this intercorrelation of separate factors 
ic of one factor might vary directly, or inversely as a 
other factor varies. A unit of “ability to discipline’ 
y in value as the square, or as the square root of the 
of units of “‘general intelligence.” 

of the other possible combinations of the characteristics 
separate factors in ‘“‘ceneral teaching success’ e.g., combina 
t ¢ yuality, constancy, and inde pe ndence, obtain. then some 
system of weighting as has been suggested for combinations 

3. and 4 above would have to be worked out 
The next step.—It is obvious that this is not a simple task 
with a fairly clear conception of the theoretical possibilities 
ore us, and with the many correlations already reported for 
rtain factors, by Boyce and others, it would seem desirable as a 
xt step to proceed with our study of the factors already defined 
hers, trying them out in new combinations, and under dif 
erent titles, until we shall have found a list of names or criteria 
means of which we can recognize the separate factors in 
teaching success.”’ Such a study will require time, and the ear 
nest cooperation of a large number of practical school people, 
ut that the end can be attained, that a simple and practical 


li device for measuring teaching efficiency can be worked out, I 
ut think there can be no doubt. 


Let us follow up our studies of the correlation between ‘general 
teaching success’ and “training,’’ academic and professional; 


between “‘general success” and “general intelligence’; between 
general success” and “‘health’’; between ‘“‘general success” and 
each of the many separate abilities involved in teaching which 
have already been worked out, as well as any new abilities or new 
combination of abilities that can be recognized. We must find 
ut, for instance, whether “ability to discipline’ when judged 


| 
| 
| 


JOURNAL ALIONAL RESEARCH 


as one of a total of five factors in 


“general success” corr, 
with “general success” 


in the same Way as it does whe, 
judged as one 
success."" And, as suggested above, we must tind out th: 
tion of “‘general int, lligence” with each of the 
factors in “‘general success ’ 

It is not tl 


ol a total of twenty separate factors in 


special al ili 


l€ purpose here to imply that the several d 


now available for measuring teac hing efficienc y are wholly 


factory. Far from it. The plans of Elliott, Boyce, Lan 
Rugg. and others have been used with fair success B 
systems as these are not in general use. The idea of this s 


measurement is not very w idely accepted 


IS Stl 


as yet, and the p; 


| further short of being common over the country. ‘T, 


together the fruits of our study to date, 


and to use their c 
tion as a Starting 


point for an extended study in whic} 
practical school people will cooperate is our present ne 
should be our next step 


BIBLIOGRAPHY 


1. Ant W.N Che selection of ti her Educationa! Administy 
ber n, 353-90, February, 1917 
2. Brrp, Gra I Teachers’ est tes of supervisors,” School as iS 
£0, June 16, 1917 
Book, W. gh school teach rfrom the pupils’ point of view.” 7 
minary, 12:239-88, September, 1905 
t. Bo 7 \ method for guiding and controlling the judgir 
efficien School Review Mon graphs, No. 6, pp. 71-82. 1915 


5. Boyce, A. C “Met! ds for measuring teachers’ effi lency,”’ Fourteen 


of the National Society fop the Study of F ducation, Part IT, 1915 S 


vat of Educational Psych logy, 3:144, March, 1912 ) 
6. Brapey, J.H. “A study of the relative importance of the qualities of a | 
and her teaching in their relation to general merit,” Educational Admin: 
ind Supervision, 4:358 63, September, 1918 
7. BUELLESFIELD. Henry 


“Causes of failures among teachers.” Fdy 
{dministration and 


Supervision, 1:439 45, Septembe r, 1915 

5. Crapp, F.  “s holarship in relation to teaching efficiency,” Sch 
Monographs, No. 6. pp. 64-70, 1915. 

%. CrarK, R.C. “A scak for measuring teachers,” American School Board J 
62:39-40, February, 1921. 

10. Corrman, L. D. “C 


-ommittee on rating, placing and promotion of teacher 
School Review Mon graphs, No. 6, pp. 61-63, 1915. 


11. Corrwan, L. D. “The rating of teachers in service.” Schoo! Review Moncgrapi 


No. 5, pp. 13-24, 1914. 


92 
Orr 
j 
is 
| 


EPH H 
er, 1916 


ven 


Sef 
| te f beg nr 
7.451-59, April 20, 1918 
\ t 
~ \l 
if 
? M 
1921 
H 
l 
bal 
( 
} 
57 
29-49-57, ] 
An 
\ rd for rat 
josepu H. “S 
1-88, February 17 


UREMENT OF TEACHING EFFICIENCY 


93 
\ 
| } Mion Ry 

her Sonth Dakot 

101 

len ef ‘ 
47 
] 

£1000.7_ D n 

\far 17. 19] 

ing 

ting ties,’’ S Review, 24:641-47, 
f remer! l and Socuty, 
Jor suggestion for teacher mé el 
6:321-22, September 15, 1917 a 
} ld teac! iting schemes seek to re wn 
Kent, R. A What should teacher ri g 
Edu nal Research, 2.802-7, December 1920 
rz, H. Siudi ind erva hicag 
Publishing Company, 190’, chapter 
me. F.C. “Evaluation of merit in Digh 
iety, 6:774-80, December, 1917 
lof I Adminis 
Bac ITTEL. F. € ““A score urd m tho of rra ma 2 
ITEL, 
ers vation and Supervision, 4:297-309, June, 1915 - 
Lirruer, Suerman. “Causes of failure among mentary Schoo! 
hool and Home Educatios 33-255-56, March, 1914 
| M 1 E ® tion of the prote nol training with the te iching 
LOYD & “Corre! on o ne rol nai Ur 
I radua 26:180 81, March, 1918 
ce 0 ormai schoo) 


EDUCATIONAL RESEARCH 


ine, 1919 


The « 
78, May, 1910 


thr 


if 
AALILY 


Se 


wr? 


ninistration of pr clice 
ton. 37°8 13. Sent 


ind ten: 


re of the 


Minn ip ii 
Minneapolis 


ird for rat 


Board of Educat 


ing student-teachers in 


24:72-80, March, 1917 


ool 


, placing, and promotion 
pp. 1-11, 1914 
ore card). Institute for Public Service 
\ plan for rating the teachers in a school sys 
, June 21, 1919 
10 Shaul receive increases in salary?” FElemen 
March, 1920 


WaAGN 


ry Scho 


WL ae rhe construction of a teacher rating scale 
Journal, 21:361-66, January, 1921 
Wrrnam, FE. C 


, 1914 


hool measurement,” Journal of Edu 
88, December 


School and teacher measurem nt,” Journal 
gy, 5:267-78, May, 1914 


4 | 
’ 
N, Ropert | (Jualities of merit in secondary teachers” 5). 
idmintstration and Supervision, §:225-38. May June, 
w led } } } 
lose ( i W gh school teac hers fail,” School and H 
33:166-69, T iry, 1914 
6. P t,H I How 1 the ibility of student t acher 
n and Supervision, 6:215-19 April, 19 
BK. | Problems of teacher me irement Jour 
10, February, 1917 
‘ resting grade teachers for efficien: Jour) 
| Valion Edu uton A tal n, 1914 pp. 
| 1 R PAM H Selecting teachers ind grading their efficier 
Board Journal, 49:11, 69. september, 1914 
10. R Rk, WILLIAM 1 gencies for the improvement of tea 
| he wt of teach im ser 
, jureau of Education Bulletin, 1911, No. 3 
+ Wi iu C nd STRAYER G. D of rit 
j iP ho ev 1 272 
mprovement of te — 
nt ich el f-rating 
’ 
\. M rhe organiz ition and adr t 
+4 I (Jualiiications, salary | 
Cat 
ools of Indiana,” School Revi: , 21:446-60, Sept 
! : I B.R \ score card for rural teachers.” School and ‘ 
september 11, 1920 
j on, 191; 
Alning I r 
teacher 
7 
1. 
; 
53 
Elemeniar 
| 
tional I 
o; 
| 


HE NEW PLAN OF ADMITTING STUDENTS Al 
COLUMBIA UNIVERSITY’ 


lransmitted by 
EpWARD L. THORNDIK! 


Teachers Columbia U1 


rue First Quotation: DEAN HAWKES 


itely it is possible to de termine with s ientilic accurat \ 
+ not the mental test is a useful addition to our academic 
If it turns out, during a series of years, that the 
between the marks received on the mental tests 
ciate work of the students is distinctly higher than 
‘on between the results of other types of entrance 


nd the college work, it would seem to be clear that 


f admission affords the best index that we have ol 
i boy to carry college work. The correlation between 

f the entire freshman year for the stude 

«» plan and their marks on the ment il test is +0.65 

t reliable data available indicate that the | ighest correla- 
can be expected between the work of the freshman year 

results of the usual college entrance examinations is about 
rhis latter figure has been obtained not only from a 

| study of our own freshmen but from similar studies in 
institution. Although it is too early to make a final 
nt regarding the matter, every indication points to the 


‘+t as a most useful addition to our machinery 0! ad 


It must be kept in mind that th group ot stuat ni 
admitted to college under the new plan are ver) carefull 
| before they are authorized to take the mental test. 


rrelations obtained should, therefore, be interpreted as 


to the new plan o! admission as a whole t ither than to 
ntal test alone. 


e public meeting of the National Associatior I of | tiona! 
t Atlantic Citv, March 30, 1921, Professor Thorndike ed the new 
ir students to Col ia Universit He based | | nin part on 
tion he first Doctor Herbert E. Hawkes, Dean ol ‘ jlumbia 

tl ond is by Doctor Adam L. J Director of U1 rsity Admissior 


| 
| 


JOURNAL EDUCATIONAL RESEARCH 


lition 1e use of the results of the menta 
‘ have been most he ‘Ipful in my 


diagnosis of academic maladi 

nic record and a low mental-t, 
different treatment from the studen: 
se mental-test mark is high And j 
test has attorded the clue which | 
eration with the university physici 
that he has not only escaped being dr 


var 


] 
fal 


an excellent academic citizen 
ise Ol a new instrument like the mental test 
tion and scruy pulous checking, but its 
ibilities for usefulness are so fundamental and far rea, 
ful and scientific study of its significance is 0; 
tasks of the next few years 


SECOND Quvoration: DirREcTOR JONES 


vill be remembered that the new method permits 
hose school and charac ter records are satisfac tory to us 
stitute this examination for the entrance examinations. (C 

ull have the option of entering by the old method 
sull have the privilege of substituting for the ent 
examinations certain of the examinations given in si ‘hools |} 
v York St ate De partment of E ducation. The records 
Z cting the older method of admission were s< ‘rutinized wit 
requirements were ve ry stric th en{ 
e New York State examin; itions were ri 
or higher in each subject and adr 
with conditions was allowed only in the case of thos: 
outstanding excellence more than made good a technical 
cient \ 
Chis requirement was most Strictly enforced in the case of t 
who came from the best high schools where the instructio 
of the highest quality, and where in consequence there was |: 
excuse lor a doubtful record. A student with a poor r 
‘rom a good school is usually a bad risk. Students coming 
small or poorly equipped high schools were treated with great 


leniency. A good Student may fail to make a first-rate r 


i 
| Imiect test jp 
an aid in il 
who na a 
generally need 
record is poor 
Cases the mer! 
‘ 
idvise the bo 
las 
Cor 
: pos parent 
ad 
( 
1m} 
| 
tor 


4{DMITTING STUDENTS AT COLUMBIA 97 


r school. Where there was room tor doubt, the candidate 
re suired to take the mental test. 

By far the most significant group so far as our system of 
n is concerned was the group entering by the new method 
It was expected that this method would appeal strongly to 
sing and alert young men in places in which the New York 
State examinations are not given and where college entranc: 
tions are less well or favorably known than in schools 
hich our student body has usually been chiefly drawn 
expectation was more than realized, the number of candi 
r the freshman class from a distance was much greater than 
t, and those who applied under these conditions were 
successful in the test. It should be remembered that 
those whose school records and character records were 
y satisfactory were allowed to enter by the new method 

constituted, therefore, a picked group 
[hat they were a picked group is evident not only from the 


ls which they presented for admission, but also from the 


rds which they made in college. They have done remarkably 
well. There were, of course, borderline cases and a number of 
se have not turned out well, but in the group as a whole the 
iilures were very few. Most of those who failed were students 
vhose scores in the psychological examination were relatively low, 
ut whose cases seemed to possess sufficient merit to warrant 
their being given a trial. Even in this group, most justified 
heir admission to college. There were a very few with relatively 
high scores whose records in college were not wholly satisfactory 
Careful examination showed that their failure to make first-rate 
records was due not to lack of intelligence nor to faulty prepara 
ion, but to failure to divide their time and energies properly 
imong the many demands which come to the college student. 
Meetings of instructors of freshmen regularly follow the mak 
ing up of the mid-term and term records for the purpose of con- 
sidering the cases of students whose records are unsatisfactory. 
In the meeting last November only two of the students among 
more than sixty whose records were unsatisfactory had made high 
scores in the psychological examination. The testimony of their 
instructors was unanimously to the effect that both students 
were fully able to do good college work. It appeared, however, 


RNAL EDUCATIONAL RESEARCH 


been devoting too great an 
r extra-curricular activi 


re of his 


] 
re sc Of LOW grade 


the low to the hig! 


verv generally 


Ui 


rea lew exceptions 


with psychological examination g1 
one froup below rec eived A’s \ 
ibove 100 ived F’s, only one above 95 did so 
work of groups covering five d 
dividing mark, between the upper 
e lower middle and upper middle qt 
lower quartiles vere studied 


followi 


: A that one of them had n f 
| the other had been taking undus advanta first op) 
' to bi n uainted with a great city 
| I tu is wn that with remarkably lew ¢ 
the higher a student ore in the psychological exami 
| Seve tudi ve been made in the course of the \ 
LH K. ( lwick of this office. Others have bh. 
v Mr. B Wood of the Department of Psvchol gy. | 
Mr. Cl 's studies he considered one hundred a) 
mer le up of groups of ten, each group having ] 
CX nation § lying within two degrees in the « 
» If Lhe yl les ot the highest ten covered ad Tang 
Wit! thi CXC] tion each two degrees in the scak 
106 was represented by a group of ten, the first ter tIpha 
ing taken in each case. The total amount of work ; 
done by each group was plotted by grades Che result 
typical | ildle, and high groups is given below 
! + 10 
Che progressive dec and increas | 
grad we £0 [rom croups i 
It was found to hold hout t 
sroups though there 
| i 
| 
| 
m FLOUILS 
= 


(DMITTING STUDENTS AT COLUMBIA 99 


Dividing I 

r 

D g Line for ‘Between Two Mid-| 

i Hig { 
( 
{ ( 
) ( 
( 


en with higher grades did a larger amount of high-grade 
r amount of low-grade work though the groups 


00 to 105 were practically the same so far as this study 


L 


It will be seen for example, that for the men with 
il examination grade of 105 “C” marked the division 
n the lowest fourth and the remaining work while tor those 

marked the division between the 


grade below 80 
uarter and the remaining work; three-fourths of the 


t was as high as the highest fourth of the work of 


the tirs 
group 

r study of the work of students in the second 
which went to prove 


he session 
. number of striking results, all of 
yressive superiority of the higher groups in the order ot 


rrades. The following is typi al 


ROUP OF FIFLY FOR EACH TEN DEGREES OI THE SCALI 


— BY’ k | 
\ 

luals Out 

Group of Fift Percent of Gro 
70 ? | 
SU) 

1714 


| 
| 

| | 


OURNAL EDUCATIONAL RESEARCH 


It should not be supposed, however, that the psychologi 
examination alone would give ideal results particularly when th, Jat 
scores fall below 80. The range from 60 to 70 was regarded as r 
extremely doubtful. While the complete school records oj 
candidates were carefully examined, especially close attentior 
was given to those whose scores were between 60 and 70, and 


the most promising were admitted. The records show that this g 
group did work practically equal to that for which the range \ 
70 to 80 and whose records had not been quite so carefully weighed 

Che psychological examination alone would not be a full 


satisfactory means of selecting students, but there has been n ex 
thought of using it without the student’s complete previou 

record. Ordinarily the candidate’s school record must show th 
completion of a school course covering the requisite entrance 

subjects with grades 10 percent or more above the school’s passing 

mark. His personal record must show acceptable mental 

moral qualities. Occasionally a student of especial promise with 

a record which is doubtful in certain particulars may be all 

to take the examination with the requirement that he pass with 

a very high grade. 

[t will be recalled that students who elect to enter by the old 
method take the psychological examination for purposes of record 
This makes it possible each year to test the results of the psy: 
logical examination for practically every student in college 

A preliminary comparison of the relation between colleg 
record on the one hand and school record, entrance examinatio! 
regents’ examinations, and psychological examination on the other 
was made at the close of the winter session by Mr. Ben D. Wood 
the Department of Psychology. The results are significant 
Among the students admitted by the college entrance examina 
tions a good many doubtful cases were included. The correlation 
between their examinations and their college records was +0. 43 
which is reasonably satisfactory. The correlation between school 
record and college record was +0.45. Those entering by regents 
examinations were very carefully selected. The correlation 
between their examination and their college records was +0.5/, 
while the correlation between psychological examination and 
college record was +0.59, a highly satisfactory result. This was 
for the first half-year only. 


{ DMITTING STUDEN TS AT COLUMBIA 101 


study for the work of the whole year shows a corre- 
60 which was 


veen mental test and college record of +0 
The correlation for the other examinations 


ly od. 
hool record has not yet been worked out. It should 


1 that there are many factors other than intelli- 
a student’s standing and that the psycho- 


nboeres 


ch determine 


mination is not supposed to measure them. 
eration of the new system will be w atched with the 


‘re and no opportunity for checking its results or im 
he methods of using it will be lost. It may still be an 


nt. but it is certainly not a doubtful one 


| 
\ 
+} } 
| 
as 
ntior ho reme 
Cl 
a] 
} 
this 1 
fi)! 
xperimc 
+} 
| 
W 
g 
} 
ner 
nt 
AG 
$3 
| 
ay 
ts 
on 
ay 
nd 
as 


{OD OF EQUALIZING THE RATING 


PEACHERS 
BYRNI 
H Instru n, D / 
I vn that ordinary mat iven to pu 
ent vy consid ble y lation due to t 
SO l ton rl Vi 
vel H lumber of plans have been wo 
ecul eater istency in pupils’ marks within a 
\ ( cle lation be foun I 
incipals in different buildings in the 
pupils ins ct e like to 
inf ion regarding individuals, and thi 
proauces a tendency toward consistency. Che rating 
however, is likely to be mad by each principal quit 
Moreover, no te cher has more than one princi 
ntalized school may have seve 
fF 1 the part of principals in the rati! 
! be described Statistically under ty 
1) differences in the general level of the ratings mad 
ditlers in tl read of the ratings. The first has to | 
the central t vy or average, and the second with 
persl 

eli in te the dit cre neces in the ren ral el of 1 ti 
by diflerent principals, it is only necessary (assuming 
to be expressed rically) to calculate the averae 
school and for all si hools combined and then to move ea 

verage up or down to make it coincide with the aver us 
WwW ol 

Uhis procedure assumes that the real average of t 
ibility in each school is the same and equal to the averag 
whole system. This may not be true, though it is likels 
ipproximately so. The fact that the | vel of teaching abi 
a given school appears unusually high or low is more lik: 
due to liberal 


or rigorous rating than to an actual con 
Moreover, if the level of teaching al 


lity is really markedly al 


102 


| 
| 


{LIZIN RATING OF if 1CHERS 


ley is condition ts lefect whi 
) edied 
ite the difie ces in the of 1 the 
lation found tor ea h scl 
ye then each te he posith in U origin 
t in terms of the n jation ol ve 
essed 1n ter the stand deviation ol vhok 
the ave! re of the 
method of attack on the pt f pupils 
cheme of control | ich t he vill dist 
- in certain prearranged pro {iol The} o 
{ ratings just indicated cons! ing t 
itted by the principals a calcu 
e orice o reduce 
jabilt It id | nduly lab 
to pupils’ marks; but it Is quite fe 
nethod of organizing crud ratin yf te 
them tinal tort It can be plied t en 
i |} i \ I 1d V1 ilso | | ful 
ot ore izing nd n ing ¢ arable 
( irs | Ni ( it ca Ist puryp. 
chen ) } i\ ‘ 
t| th ratit | ner or til mn 
\n illustration is given here ol the method of reduci ru 
to uniformity of level and dispersion. In the particulat 
r scheme re orted each t ichet is n ked on ten ] int 
vith tive degrees ol quality, these degrees being repr nt 
numerical values of 6, 8, 10, 12, ind 14. Thus a teachers 
um possible numerical rating W tuld be 60 and the maximum 
1 be 140. The method of equalization is equally applicabl 
ther rating scheme so long as numeri | values e used 
e first step is to tabulate the crud ings and calculat 
each school and for the group of scho e arithmetic mean 
rave) and the standard deviation This has been done tor 
stem consisting of seven schools with resul is shown In 


lable I. 


104 JOURNAL EDUCATIONAL RESEARCH 4 


rABLE I CENTRAL TENDENCIES AND DISPERSIONS OF RATIN 


SCHOOLS 
1 2 > } 5 1) 7 
Number of 16 17 14 23 10 12 33 
\rithmetic mear 107.1 121.6 108.4 114 8 110.6) 120.2) 123.4 41 
Standard deviation 8.0 13.5 7.9| 10.7 69 115 00 


he second step is to take the arithmetic means and stand 
deviations found and construct a conversion table which 
enable us to find for any crude rating the revised or standard 
rating desired. The conversion table is illustrated in Table T] 

To construct a conversion table first place the values for | 
arithmetic means in the row marked 0.0¢. Then from this poi 
measure upward and downward various intervals from +2 
to ~2.5¢. A complete set of values for any number of subdivi 
sions may be filled in, but in practice it is sufficient to put down 
framework of values—e.g., at inter als of 0. 5¢ asin Table II—f; 
which all individual marks can readily be calculated. It is , 
to see for any crude rating what the corresponding standardiz 
rating would be in any proposed scale. Theoretically the cru 
values would be transformed into those of the “Composit 
scale, based on the array of ratings from all the schools. In pra 
tice it will be less troublesome and equally serviceable to transforn 
to a conveniently numbered scale such as “B” or “D,” whi 
approximates the conditions of the composite. Or if preferr 
a scale stopping at 100 can be employed, such as “A.”! In th. 
last column of Table II five general subdivisions of quality 
indicated; but any other number can readily be employed. 

According to Table IT. a teacher in School 1 who was rat 
123 would be entitled to a rating of 140 (to the nearest whol 
number) on the composite scale, 143 on scale D, 130 on sc ale B 
and 95 on scale A. On the other hand, a teacher in School 
due to the more liberal rating prevailing in that school, who wa: 


' However it is advisable to avoid the use at any time of a marking scale which 
Stops at 100 percent; the preconceived idea that marks must fall between 80 and 100 or 
even 90 and 100 is hard to resist. 


i 
+ 


105 


| | 
a $9 OOT 16 8 Fol PILL SOF FOr |S OOT SOT 66 
QQ 
| 
Ol ol eu |o6 |S | | set 08 =o 
| 
001 (Set ost | | | 9 | s | | | 
Os Cg wo’) 
\ a ) 
WHOAIND VO I AGOAD ONILNAANODO ATHVI 


| 


LION AL RESE {RCH 


nterpolat , Ol Course, | 


TI, intervals are as large as U 


a conversion table in order 
cn in lable IT] 
are entered according to ti 
several schools. It js 
eans and standard deviati 
puted { ndet ihe caption “*Scalk 
wn when converted into the units 
rences both in general level and in 


ainly to different Wavs 


llorm scale is used 


SCALI 


] 


Schoois 


— 
; rated 149 would be entitled to no higher ratings on th 
Cale Ing obt ined in anv sch | can be r Ad] 
atic iit us ) 
dize tl] Ings 
“Crude Scale” th 
} ‘ 
| ratings that the ari 
In lab] | cre 
same ratin S are s 
BR The larger di! st 
tion li pre 
ire greatly reduced when the u 
TABLE II] DI HON OF CRUDE RATINGS AND STAN \ 
SCA 
RATIN All 
4 Schools 1 1/41/1516 
138 1 2 
134 
132 i 9 
130 1 l 2; 2 6 | 1 
128 ] 2 1 2 6 
126 3 1 5 9 3 1 { 
124 2 l l l 5 1 | l 
122 2; 4 2 
= 
3 
120 ; 7 2 13 
*| 
| a | 2 
118 2} aj...) 2) 6 | 


EQUALIZING RATING OF TEACHERS 


Crupr SCALI ScALE B 
\ 
41/516) PIZIS 1S 
| 
1 Z 11 1 
] | 
1 1 1 
1 
] l 
5 l 
5 ) 
2 ] 
] } 
2 
1 2 | 
1 1 1 5 ) 2 
| 
l 
1 1 ] 


10 


| 
\l | 
| 
2 
13 
1 6 
1 
1 
1 
16| 17) 14) 23} 10) 12) 33) 125 16) 17| 14) 23) 10} 12) 33 125 


JOURNAL EDUCATIONAL RESEARCH Vol 


Figure 1 possibly shows the character of the situation mo; 
clearly. In the crude ratings Schools 2. 4. and 7 have a |a;, 
proportion of their marks higher than the maximum attaing 
Schools 1 and 3. But in Scale B it is evident that there is 
balanced distribution 

The writer would not claim that the assumptions on whi 
suggested method of organizing crude ratings is based corres} 
exactly to the truth, but he would submit that if princip 
to file teachers’ ratings year after year, these records 
rendered far more reliable. significant, and valuable if th, 
subjected to some statistical treatment such as the one illust; 


IS 


FIGURE 1. COMPARISON OF CRUDE RATINGS AND RATINGS 
UNIFORM SCALE B. DATA FROM TABLE ITi 


i% 
2 
| 
RATING | RATINGS LE OB 
| 
+ c A 4 
130 + p bo 4 
int 5 © t T ia? 
© = b = + 
iu = c 
(4 ~« t = © © 4 
© p p 
06 « po bo + ins 
© 
104 + pb © | + 
| 
b 
j 
| 
=, 
| | +e 
Be 
6 ° + 
| | + 
| + 1 
+ 2 J 4 
5 | SCHOOLS | 
| 


rk. Already good efi 


the requirements for such tests as laid down 


whose advice it was carried on 


COOPERATIVE CHEMISTRY TESTS! 


HAYES 


East Technical High Sch ol, Cleveland, Ohio 


BASIS OF THE WORK 


try teachers of Cleveland have been cooperating 


chemis 
in effort to formulate questions which would 


ut a year in 
sacure of the information in chemistry possessed by their 
Questions upon limited portions of the semester’s assign- 
-e sent out from time to time to the difierent teac hers for 
their classes. A record was made for each question on the 
returns from the teachers. These questions were gleaned 
many being submitted by the teachers 
s. As a consequence, there are on hand some tive 
questions based on the textbook in use McPherson 
Henderson), which have been tried out in three or more 
lc and have been attempted by at least one hundred pupils. 
tended that this work shall be followed up until about 


i questions have been standardized for the subject 


erent sources, 


t hundre 
ented in Cleveland. 
feature of these tests is in itself of the greatest 


The operative 


We are trying to prepare % 
ects have come from the scrutiny which 


1 measure for use in our own 


ive individually given to our work as a result of this united 


rt. New projects which have developed directly or indirectly 
this work await their turn for consideration and treatment. 


PREPARATION OF QUESTIONS 


The questions prepared for these tests called for equations 
sle-word answers. It was the aim that they should conform 
by Dr. J Crosby 
pman.* 
In connection with this study, appreciation is due and very gladly expressed to 
schers for their aid and interest, and for their kindly adaptation to the conditions 
tests, which often interfered with their customary individ ial procedure; also to 
iquarters staff, especially Mr. Welles, for sy mpathetic support. Particular 
ire due to Dr. Chapman, at whose sugg: stion the work was undertaken and 
*Chapman, J. Crosby. “The measurement of physics information,” School 
27:748-56, December, 1919. 


109 


| 
4\ | 
mor 
| 
tr 
+ ‘ 
It 
+ (24 
VOLR 
+ 
4+ ine 
aii 
+ 
or 
= ti 
i C} 
‘ 
Ret 


11 LOURNAL EDUCATIONAIT RESEARCH 


| pre ed the sis based 
q ere his In order » pl 
| ‘ ( hese ¢ 
( in eyvl ciminar 
} } 


\ 
] i Ot these I 
i ] 
{ cy cs . 
Ail 
{| 
‘ ‘ re dan 
l l ts | 
; 1 tO exact wers na the ! 
| In no way to be considere 
L 4 tion, but it has m dvan 
‘ won | 1 aug ] 
to eon the intory Lion yp. 
. 
| ( ist! Live well lo « | 
‘ 
ded In cn set which 4 is Wrpose} 
+] ‘ } 
e init usual pro Lest but Vnici 
of this sty] test cultiy 
use ol] Mis style ot test cultivate 
| ul Lo dittere nuiate values. not on 
the tests, but in regular re itations as well 


— 
q 
the various schy 
ers Ol the committe 1! 
Line Cl t pore ni ind 
| 
| ICSLIO] en, althoug 
iif | ite og | ] 
ion of the indivi chers. Thi 
yeni questions used is as follow 1) Prelij 
| 
| 


a COOPERATIVE CHEMISTRY TESTS 11 


DIRECTIONS TO TEACHERS 
slowing directions were given to the teachers for conduct 
tests 
ifethod of Administering the Test. 
1 Before starting, instruct the pupils to fill in the blanks at 
d of the “Answer Sheet,” Figure 1, except the “Total 


FIGURE 1. ANSWER SHEE! 
School 


Topi Total Right 


R LT ANSWERS ANSWERS RESUL1 No 


10 20 


). Instruct the pupils to write their answers in the large spaces 
vhich are headed, “Answers.’”’ Each answer should be in the 
space which corresponds to the number of the question given. 
lhe answers should be in one word, if possible. Under no circum- 
stances should any writing be done in the “Result” spaces. 

3. Read each question to the class slowly and clearly; then 
repeat it immediately. Watch your pupils to judge the time 
needed in answering the question in hand. Do not change, 
explain, or omit any of the questions. 

1. Before collecting the papers, reread the questions, allowing 
orrections to be made. 


{ 
Richt.” 
12 
13 
14 
15 
16 
7 17 
18 
9 | 19 
| 


112 JOURNAL EDU ATIONAL RESEARCH 
5. When the time is up on the last question, stop , 
Mark the correctn ss of the answers in the “Result? spaces, | 
| by means of a large che k if correct, and by a large 
6. Count the number of correct answers on the sheet 
number in the space labs led “Total Right.”’ 
Use to be Made of the Final Scores o} Each Pupil 
7. On the Summary Sheet,” Figure 2. enter the “S 
the Questions.” This will indicat. the number of pupils 
cass who answered the indicated question correctly. 


\lso obtain and enter the “scores of Pupils.” Th; 


indicate the number of questions answered correctly 


pupil in each class 
9. The balance of the sheet will be filled out by th 
who is conducting the experiment. 
NOTE: The “Answer Sheet,” Figure 1, was prepared 
convenience in marking. These Sheets can be easily har 
one ol two ways: (a) If stacked they can be fingered overr 
the answers to two or three questions being marked it ea 


over, or (b) The sheets can be spread out | iterally so as to 
either the right or left halves of all the sheets and then on 
of all of the answers can be most rapidly marked. The “Ss 

mary Sheet,” Figure 2, was prepared for the sake of uniformit: 
the making of reports 


THE Tests: QUESTIONS AND ANSWERS 
The questions are here arranged in order of increasing dif 


as shown by the results of the tests. The question number 
ate the original order. 


FIRST SEMESTER TEST 


Ground covered: McPherson & Henderson, chapters 1-xvu 


VALUES QUESTIONS ANSWERS 
1.100 2. How is pure water usually prepared in 
the laboratory from impure water? Distillation 
1.752 1. If a piece of magnesium is burned, how 
| does the weight of the resulting solid com- 


pare with the weight of the original piece of 
magnesium ? Greate 
2.469 6. In preparing oxygen from potassium 
chlorate, manganese dioxide is added to the 
chlorate. What kind of an agent is the man 
ganese dioxide in this reaction? Catalytic, Catalyze 


| 
| 


COOPERATIVE CHEMISTRY TESTS 113 


sR 11. In putting out a fire, one ot the three 
tors of combustion must be removed. In 
use of a hand fire-extinguisher, w hich 


ne 
factor 1s removed? Supporter 
52 12. What is formed in addition to nitric 
ide and oxygen when nitric acid decom 
poses in its usual manner as an oxidizer? Water, H,.O 
{ 13. What is hydrated ammonia? Ammonium hydrate 
Ammonium hydrox 
ide, 
Aqua ammonia 
NH,OH 
926 9. What compound is always formed by 
neutralization? Water, H.O 


8. Unless adjustments are made by the 
pilot, a balloon tends to descend towards 
evening and to rise toward midday. By the 
‘id of what gas law can this be explaine 1? Charles’, 


Gay-Lussac’s 
000 3. When hydrogen is passed over hot 
copper oxide a chemi al change takes place. 
What is the action of the copper oxide? Oxidizing agent 
Oxidized the hydrogen 


7 10. If nitrogen were prepared by burning 
it the oxygen from some air, what very 
ha ‘nactive element would make up about 1/79 
< of the unconsumed gases? Argon, A 
ity iy 3.224 5. Write a molecular reaction expressing 
cas the oxidation of an element which shows | 
that water is a product of combustion? 2H, +0, =2H,0 
153 15. The valence of the element “Y”’ is —4 
and that of the element “X” is +3. Write 
7 the formula of the binary compound which 
these elements could form. Xu) 
531 17. Ammonia will escape from a bottle 
of ammonium hydroxide as long as the 
bottle is unstoppered. If tightly stoppered 
equilibrium is soon established. Express 
' this equilibrium in a reversible reaction. NH,OHS$N Hs + HO 
3.531 7. From what compound can pure nitro 
gen be prepared by heat? Ammonium nitrate, 
NH,NO; 
3,652 14. Complete and balance the following 
reaction: 
Ca(OH), +H,PO, $Ca(OH),+2H:PQ.@ 


Ca,(PO,).+6H:0 or 
Ca(OH),+ = 
CaHPO,+2H,0 


iad 
[Vel i 


/OURNAL EDUCATIONAL RESE 


693 Write t 


he equation for the complete 
etween copper and hot concentrated 
yhuric acid 


« equation for the reaction 
place when nitric acid is pre 


lerately low temperature 


16. What do we call the action 


upon salts by which a base and 


formed 
19. Write the 


vhich t 


ina 


equation for the action 
akes place when hydrogen sulphide i 
passed into con entrated nitric acid liberat 


ng sulphur 


SECOND SEMESTER 17 ES] 
covered: McPherson and Hen 


(JUESTIONS 
2. What process beside the acid-Bessemer 
process 1s generally used in the United 
Stat steel of commerce? B 
10. When iron ores are mixed with 
able flux and are reduced with coke 
is the main product? 


5S. Write the 


which t 


es in producing the 


a suit 
what 


equation for the reaction 


ikes place between chlorine and 


15. W hat ele ment because Of its attinity 
or oxygen is most generally used in metal 


il 
lurgy as a reducer? 

6. What metal which is lighter than water 
will dec ompose water and set free half of the 
hydrogen without the hydrog 
if the water is cold? 

13 


the 


} 


en taking fire 


lhe insoluble soap which gathers on 
e top of hard water in washing is 


apt to be 
alt of what metal? 


ll. We breathe out 


carbon dioxide Irom 
our lungs. 


The foods and tissues of the 
body are subjected to 
ess to produce it? 

14. What material besic 
flux must be 


what chemical proc 


les ore, fuel and 
used in the metallurgy (smelt 
ing) of iron ore? 

8. What product besides carbon dioxide 
forms when magnesium carbonate is heated? 


Hot 


ARCH 


Cu +H,SO, 


NaNO 
NaHSoO, 


2HNO,+3HLS 


$H.0+2N0 


lerson chapters xvi 


ANSWE} 


asic Oper 


Carbon, ( 


Sodium, Na 


Calcium, 


Magnesium, Mg 


Oxidation 


all 


Magnesium ox 


3 
a 
+. Write t 
h takes 
pare l at me 
H) 
cid arc 
Hvydrolvys 
1.900 
| “round 
VALUE 
1.012 — 
| — 
? 046 
ast (Pig) | 
136 
— 
2H.O+CL =4H 
LQ 
2 136 
2.180 
2.307 
the 
2? 
| 
2.776 
| 
963 


Topi 


Sx hool 


Ho 
r \ et 


The second? Et rectly? 


Sections To Per No. of 
tal cent || Question 

415 Answered 
Correctly 


0) 


1 


20 


Total 


OOPERATIVE CHEMISTR) TESTS 


FIGURE 2. SUMMARY SHEET 


| 
| 
4 
Teacher 
conse of Punil 
COTe Up 
PF many pupils scored 0, 1, 2, 3 
in? One? Two? 
Section To Per 
tal cent 
I 
| 
6 
9 
Ti 
H 10 
11 
12 
‘ 13 
14 
15 
16 
| 17 
Is 
19 
Totals 


3.037 1 


LOURN 


EDUCATION 


hydroxides are bases? 


3.074 12. On developing a photographic plate 


by what chemical process is the metallic 
silver deposited in the film? 


RESEARCH 


What do we call those elements whose 


Metals 


Reductior 


112 17. If sulphuric acid and sodium bromide 


261 


react, what element is likely to be liberated 
when the sodium sulphate and hydrobromi 
icid are formed? 

What common commercial substance 
is formed when sodium carbonate, quick 
lime and an excess of quartz are fused to 
gether 

9. Into what is calcium carbonate 
hanged if an excess of carbon dioxide is 
passed into water in which the carbonate is 
suspended? 


3. Type-metal has the ability to expand 
on solidifying, making a fine casting. This 
ability of the alloy is due to what metal? 

19. What ion is liberated in excess by 
hydrolysis when washing soda is dissolved 
in water? 

4. What is the approximate weight of 
11.2 liters of carbon dioxide? 

16. Write the equation for the complete 
combustion of the third member of the 
methane (marsh gas) series. 


18. What active parts of acetic acid are 
indicated by writing the formula of this acid 


Bromine, Br 


Glass 


Calcium bica 
Calcium hydr 
bonate, 
Calcium acid 
ate, 
Ca(HCO,), 
CaH,(CQ,), 


Antimony, S| 


Hydroxyl, OH 


22g 


CsH, +50, 
3CO.+4H,0 


as HC,H,O, instead of HyC,O,? Ions 


20. Write a reaction for the reduction of 
sulphuric acid to hydrosulphuric acid by a 


binary acid which is a strong reducing 


agent ? I 


RESULTS 


1,.50,+8HI = 
4H,0 +41 


r} 


wel 


(hese tests were given to the pupils, boys and girls, o! 
academic and technical high schools, who had just complet« 
ground covered by the tests. The results are based on the w 
581 first-semester and 268 second-semester pupils. 


” 
116 
+. 376 
ar 
| + 49) 
| 3.778 
4.196 
4.248 
| 


ri 
ore 
ul 


COOPERATIVE CHEMISTR} 117 


1 and unweighted scores have I 


both welgntea een 
? na 4 | ‘sohtine ti 
compared, Figures 9 anda 4 n weighting the 


1 of Dr. B. R. Buckingham for ‘scaling’ * was 
- determined the probable error values, the weighted 
tained by taking the P. E. of 0.000 as equal to 


values to or 


the obtained P. E. 


is base. The results ol this weighting are given In 

T 
ing the di hie ulty of the qu stions in the two tests, 
' values can be tentatively taken as they stand 
it some Ol the questions from the tirst-semester 
iven to the more a lvanced pupils would have shown 
ifferent values—in some instances greater, in others 


Further use of the questions will reveal the nature of these 


I test was limited to the Work of a single semester so that 
nothing in these tests, as given, to mmadicate the g! ywth 


which 


ol 


eight 


they were given to different groups, 0 
‘ved training for one semester and the other for two 
\ distribution of the pupils in both tests Is given by schools 
Hons in Table II. Schools A and B require chemistry 01 
\ in the tenth year, B in the eleventh year, while th 
ke it elective in the twelfth year. Phe twelfth-year 
Ath a negligible exception, have had tl is chemistry pre 
y a year of phy sics. a condition which does not exist in 
Is A and B. A score of 14 is the approximate limit of the 
‘ their pupils (both tests considered) exceeding this 
§ whom three were able to make a score of 16. Of the pupil 
lected to take chemistry only five made scores of more than 
hree being perfect scores. 1 he scores bear out the fact that 
‘erences in ages and preparation, and the privilege of el 
ire distinct handicaps to the younger pupils 
igure 3 should be read as follows: > hool A had 242 pupils 
the test: their average unweighted score wa 7.68, and their 
we weighted score was 20.79; the average pupils per class 
re 27, the subject was required of all pupils during their ten h 
ar and before the subject of physi s was taken. Et 
its measurement and disiribution Teachers 


Buckingham, B. R. Spelling ability: 
lege. Columbia University Contributions to Education, No 


achers College, Columbia University, 1913. 


New York 


59.) 


| 


118 IOURNAL EDUCATIONAI RESEARCH | 


Figure 4 should be read as follows: School A had 144, pj 
take the test: their average unweighted score was & 31, an 


iverage weighted score was 20 38; the average pupils per 


L 
4: 
497 | 
AEY 4012 
AVERAGE WEIGHTED SCORES 
AVERAGE UNWEIGHTED SCORES 
| 
| 
| 
| 
2693 
415) 970 | 
| 
mys } 
| 
74) | 
| 
| sof | 
| 
aso | | 
| 
‘7 | 
HOOLS A B | 
| 
| MLRAGCE SIZE OF CLASS 27 |! 40 24 
i 
MISTRY TA 1 AR it” YEAR (2 
PUSTRY RCO Lie ELECTive 
THESE I ; HAD No 
FIGURE 3. CHEMISTRY TEST. FIRST SEMESTER, JANUARY. 1921 


1? 


were 21; this school required the subject of « hemistry of all pupils 
| during their tenth year and before the subject of physi 


| taken Et 
In preparing Figures 3, 4. and § the average score of School A 
First Semester Test, was taken as a base. A comparison of t! 


veighted and unweighted scores. Figures 3 and 4. reveals 


= 


OOPERATIVE CHEMISTR) TESTS 119 


younger pupils of the tenth and eleventh years lost 
while the older pupils of the twelfth year generally 


» gain in chemical concepts and reasoning ability as 


wiv 
4637 
44% 
| 
{15.52 
4 
| | 
| 
RAGE WEIGHTED SCORES | } | | 
WrRAGE UNWEWGHTED Scores... 
rx 
} 
| | 
| | bs 


20.5 } 
| 
| 
| 
} 
PUPIL 4 | Ib IT 
i 
nave 


! CHEMISTRY TEST, SECOND SEMESTER, JANUARY 1920 


irse proceeds is thus clearly indicated This 1s more 


in Figure 5 where the averages 0! the two groups are 


pared for each semester. 


i 
® PUD! . 
ul 
eT new 
iden 


7 
120 (OURNAL EDUCATIONAL RESEARCH | 
AVERAGE SIZE OF CLASSES 
f The average size of the classes which took these tests r 
" from 11 to 27, while the actual size ran from 11 to 32 pw 
class The variation in the sizes of the classes within th 
given had no consistent effect on the scores of the various 
or schools This is according to accepted data where th: 
do not generally exceed 25 pupils per class. 
KEY. FIRST SEMESTER. 
ND SEMESTER 
3209 
26 
25-4 
} 2505 
| 
20 56 
f 
| 
| | CHEMISTRY TAKEN IN 12" YCAR AVERAGE 
AND YEARS CHEPUSTRY ELECTIVE FOR ALL SCHOOL 
| | CHEMISTRY REQUIRED oF HAVE HAD PHYsics 
1 | au 
THESE PUPUS MAVEHAD 
NO PHYSICS 
A&B CDEFGaH 


FIGURE 5. CHEMISTRY TEST, FIRST AND SECOND SEMESTE! 
JANUARY, 1920—WEIGHTED SCORES 


Figure 5 should be read as follows: The pupils of Schools 
and B take the subject of chemistry in either the tenth or eleve! 
year. They had average weighted scores as follows: first semeste! 
19.27, second semester, 20.56. The average of all pupils taking 
the first semester test was 25.05, and the average of all those tak- 
ing the test in the second semester was 25.27. Etc. 


\ 


(HE RELIABILITY OF THE BINET SCALE AND 
OF PEDAGOGICAL SCALES! 
ARTHUR S. OTIs 
Washington, D. ¢ 
AND 
HERBERT E. KNOLLIN? 
Leland Stanford Junior University 


nnection with a study made by one of the writers (Knol 


the intelligence of 180 adult males, it was suggested by 
Ir. Terman that it would be desirable to determine the reliability 
ile used, which was the 1915 edition of the Stanford Re- 
of the Binet Scale. The 180 subjects included 150 migrat- 
nemployed and 30 business men. The usual precautions 

re taken in the administration of the tests. 

\s a measure of the reliability of the scale it was proposed to 
find the probable error of its score (the expression ‘‘probable 
rror’ being used in a restricted sense). This was found to be 
oproximately six months in mental age. That is, in 50 percent 
f cases, mental ages of adults may be assumed to be correct 
within six months. It follows from this, theoretically, that in 90 

rcent of cases the score will probably be correct within 15 
nths, and in only one case in a hundred will the error probably 
in excess of 23 months.® 


Involving a determination of the “probable error” of a mental age by the Binet 
Scale, an example of the use of a difference formula for correlation, and a discussion of 
gical and mathematical aspects of the reliability of scales for measuring “mental” 
pedagogical” ability. This article was written in 1916. Publication was delayed 
n account of the war. 
*Knollin is responsible for the testing and scoring involved in obtaining the 
naterial for this study; Otis is responsible for the method and proofs 
*More recently Mr. Virgil E. Dickson of Stanford University, using the same 
ethod as that described herein, has found that the probable error of a mental age 
vben the Stanford Revision is used with first-grade children, chiefly six to eight years 
f age, is approximately three months. Though less in absolute amount, this is about 
same proportion of the mental age. Fora child of seven years, an error of three 
nths in mental age is one of 3.57 points in intelligence quotient. Taking 14 years 
s the median mental age of our miscellaneous adults, an error of six months in mental 
ge is the same error of I. Q. This indicates that the probable error of a score varies 
th the amount of the score, and suggests that the probable error of an I. Q. is prob- 
ably approximately constant, being about 34 points. From this it would follow theo- 
retically that an I. Q. by the Stanford Revision is probably in error to the extent of 
ibout 6 points or more in a quarter of the cases, 10 points or more in one case in ten, 
ind 14 points or more in one case in a hundred. 


121 


| 
| 


MOURNAL EDUCATIONAL RESEARCH 


In tinding the probable error of a 


between the scores by the two halves of the scale was 
by the ‘“‘method of rank correspondenc e’’ described els 
Che determination of a coefficient of correlation between 
eries of scores by a “‘Difference Method” is illustrated. 


The method deemed proper for expressing the degr 


liability of a scale or test is to give the probable error ol 
rather than to give only the coefficient of correlation 
series of scores or between de grees of ditt ulty of t 
scale for different groups of individuals. 

The maker of a scale, ether ] 


of results, the prob ible error of a score by the scale. as an 
of its reliability 


‘THEORETICAL CONSIDERATIONS WITH REGARD TO Mert! 


Chere are various ways in which we may conceive of t] 
bility of the Binet Scale. We may ask 


1. What is the probable deviation of a mental age by the 


Scale from the average mental age that would be obtained 


same examiner testing the same individual many times, as 


no eflect remained in any case from the previous testing: 


deviation would result from fluctuation of attention, et 


part of the examinee, and from possible differences in met} 
giving the tests on the part of the examiner. (Concept / 

2. What ts the probable deviation of a mental age by th 
Scale from the average mental age that would be obtair 
many difierent examiners testing the same individual with 
ent scales made as nearly as possible like the present Binet 5 
Phis deviation would result not only from the above cau 
also from the differences in personality of testers, and fr 
impossibility of making two scales exactly alike. It would t 
tore be creater than the first dey iation. Cone epl 2 


3. What would be the probable deviation of a mental ag 
the Binet Scale from the true mental age of the individual! 


* See Reference 4 in the bibliography at the end of this article 


Since this formula was first presented by one of the authors (Referenc« 


has made diligent search in the literature in order to discover whether 


formula had been presented before. Such prior presentation has not been { 


the tormula in its present form is believed to be original with Otis 


score, a line Or re 


the elen ent 


pedagogical or intellig 
should give, therefore, in the interest of a proper interpret 


ses 


| | 
— | 
| 
tj 
| 
assuming 
on th 


BINEI {VD PEDAGOGICAL S¢ ILES 123 
ed by a hypothetical s« ale which measured the intelli 
tly. according to some definition. The average ot a 
of measures of the intelligence of an individual by 
rocters using the same and different scales would not be 
the influence of environment, et Accordingly, this 
vould doubtless be even greater than the second It 
he true measure of the probable error of a mental age 
Binet Scal Conce pl 3 

We have no means of me isuring this true probable error since 
/ true measure of the intelligence of the individual 
» way of measuring even the second mentioned devia 
ve do not have different independent Binet scales 
\or - a second testing by a different examiner would give 
ewhat affected by the first testing, and for this reason 
Srst-ementioned deviation must be found in an indirect 


Still another point to be noted here ts that the probable error 
nele score (in the sense of being the probable deviation ol 
of any test from the average of a large number of scores 

me individual by the same scale) is less than the probable 

yn of any one score from another The former is, in fact 


to \/lo times, or 0.707 of, the latter theoretically. (See 
\ppendix I for proot. It is this ] robable error of a single scort 
ve are seeking ultimately and to which we shall refer when 


ing of the probable error of a mental age 
For the purpose of finding the probable difference between 
two measures of the mental age of an individual by th 
Rinet Scale, made by the same tester and assuming no lasting 
t of the first testing, we are obliged, since we have but one 
to divide it into two halves and find first the probable or 
edian difference between the mental ages of single individuals 
the two halves of the scale. Upon theoretical grounds it may 
shown that the probable error of a mental age by the Binet 
Scale as a whole is equal to 1/14 times, or 0 707 of, the probable 
ror of a mental age by one-half the scale. Therefore the prob 
e error of a mental age by the whole scale is equal to ¥14X V2 
es, or 4 of, the median difference between the mental ages by 
two halves of the scale and hence can be very easily found 
from the latter by dividing it by 2. Proof of these propositions 
is given in the Appendix (I and III 


lrawy 


124 JOURNAL EDUCATIONAL RESEARCH 


PROCEDURI 

Che scale was divided into two parts by placing the 
of the tests of each age group in one scale which we hay 
Scale A, and the second half of the tests of each age gr 
second scale which we have called Scale B. The values of . 
in months were then doubled so that the mental ag: 
half would be comparable to that by the whole. A point w 
plotted as a small circle in Figure 1 for each individual! 


an abscissa equal to his score by Scale A and an ordinate ¢ 
his score by Scale B. A relation line was then drawn t! 
these points as shown in the figure, according to a method 
is called “‘the method of rank correspondence,” and which n 
called the single relation line to distinguish it from a reg 
line of which there are two for any pair of variables. This 
is based on the assumption that, considering the values in 
ally, the median of one distribution most probably corresp 
the median of the other, that the upper and lower quartil 
of one distribution most probably correspond to the upper 
lower quartile values respectively of the other, and simi 

the values having each of the other ranks in one distributi 
probably correspond to the values having the same rank ir 
other distribution. For the purpose, therefore, of findi 
position of the line of relation between pairs of scores, certai! 
of scores having the same ranks were selected. These w 
middle values of each consecutive five in each distributior 
beginning with the upper end of the distribution of Scale B \ 
the third, eighth, thirteenth, eighteenth, etc. values were in 
by blacking the centers of the circles; and the values having t! 
ranks in Scale A were similarly indicated. By means of cross 
points were then plotted in the quadrant whose respective a! 
sas and ordinates were equal to the values having th 
eighth, thirteenth, eighteenth, etc. ranks in each distributior 
this way 36 points were plotted, the abscissa and ordinate ot « 
having the same rank in the two distributions. According t 
assumption, then, these points best indicate the trend of the 
relation between the scores by the two scales. Inasmuch as t 
seemed to be no marked indication of a curvature in the lin 
relation, it was assumed to be rectilinear. Therefore a straigh' 
line was drawn as nearly as possible through the crosses, and t! 


* See also Reference 4 


A 
No.3 
the 
Fad 


BINET AND PEDAGOGICAL SCALES 125 


| to be the line of relation between the scores by the 
If this line may be assumed to be correctly placed, 
linate of each point on the line represents the score by 
ich corresponds to the score by Scale A represented by 
By means of the relation line, then, every score by 


\ may be transmuted into terms of Scale B in order to com 
8/4 
anre 
oi: 
; 
f > 
oO 
> 405 
+ a 
Pad § 509 
220° 
| 
tne 
4 
7 
10 >, 


EF 1. THE CORRESPONDENCE BETWEEN THE MENTAL AGES 
OF THE 180 INDIVIDUALS BY THE TWO HALVES OF THI 
BINET SCALE (STANFORD REVISION) 


This line is then presumably approximately as would be the line, y => x, if the 
z 


issed through the means of the two distributions. To draw this latter line in the 
, however, would have been to assume that the line of relation was a straight 
+} 


without first determining whether it was or not. The above method makes no 
1 priori assumption. 


y | 
Tras Wy Scele A Tests 2k3 
| 
iT 


126 MOL RNAL EDUCATIONAL RESEARCH 


pensate tor differences in difficulty between the two s 
difference betwee nh any actual score by Scale B and th 


the same individual by Scale A, after this has been tr 
into terms of Scale B, is one of the differences of whi 


seeking the median. The value of this difference Wi 
sented by the distance of the point for this individu 
hye low the line. The median of these distances. therefor 


the probable difference, in terms of Scale B. betwee} 
ils in the two halves of the Bi, 
when the difference in difficulty between them has bee: 
ited for 


of the several] in lividu 


Che distribution ot the distances of the points 
low the line is s] 


shown in Figure 1 at the right. In order 
i 


1; 


the me 


lan of these differences graphically with a 1 
egree of accuracy, it was necessary to construct Fig 
| which the distribution of the differences has been D 
| t the left using a larger scale and with both plus 
differences measured upward. These have been plott 
t the right with eacl differencs measured ona separ 
the ordinates increasing in magnitude to the right \ 
curve, being one-half an approximate ogive, was 
through these points as shown. The mi Ipoint of this 
measuring horizontally, was then found by erecting 
between the ordinates of the ninetieth and nin ‘ty-first dit 
This pe int may be se en to re present a difference of ap] 
10.8 months in mental age As a check upon this met 
tverage of the differences was calculated and found 1 
months Multiplying this by 0.84538 gave 10.( 
which is very nearly 10.8, as the theoretical median dit 
These measures, of course, are in terms of Scale B. 1 
the interval in terms of Scale A corresponding to 10. § 


n scale 


B, it is necessary to multiply 10.8 by the 
gent ol the angle of the line of relation with the horizont 
since this is the ratio of the projections of any section of 


of relation upon the two scales. In this case the cotangent 


0.96. Multiplying 10.8 by 0 96 gives 10.4 months as th 

difference between scores in terms of Scale A. We may n 
r Che median deviation of t normal distribution equals 0.8453 times thi 


J 
“So 
boas 
T 
. 
~ 
> 


BINET AND PEDAGOGIC AL SCALES 


cane 


DIFFERI 


DISTRIBUTION OF VALUI 


\ 


a) j 
io 
© 
1) 
| 
2 | 
5 a a 
: 
Median 
\3 Oitterence 
: 
~ Ay 
| 
B\ A 
> — 
| | 
| 
~ 
| | 
rd 
= 
cA ~ 
t | aT 
‘ g 8 aS 5385 | 
™ 
lierences, Positive and Negative = 


IZIONAL RESEARCH 


found to he only 0) 


ave vVaiues, 10.8 and 10.4, which is 

enc etweer ‘res in ter? f. 

it In teé I 
Seal As 1 tated } bh] 

Binet r tq ¢ mcept ] is equal to one-} 
and theretore approximately 5.3 months 
I lit In this connection. however t 
ta hignest re those in the neighborh. 

Licrences tween the score SDV the ty 
lid not very great since very few teste wu 
yi 
| ; ‘ ] } ] 
these differences are less than they would ] 
t! ei een a | rer nul er of the more ditticult tes 
. no d rend our obtained vaiue of the probable er ! 
what rt its true value It is thought likely. 
that would be more 1 the true 
pl i ct Oroiana ] > SCOTe Dy the Stantord Revisi 

Binet Si ue according to Concept 1.9 
CHE Improp OF] 
MPROPER OF THE COEFFICIENT 0] CORRELA 

\ 

ee MEASURE OF RELIABILITY 

* ae In this connection it micht be well make refer 
nn it mig! pe Well to make referenc 

é mon improper us coefhcient of correlation as an exp; 
tie aes C Of Tell Ol a measure or Li€ 
proper tor expressing the degree of reliability of 
td a5 1n this s 1 LO five the probable error of a score ll 

os In which the score is expressed lhe difference between 1 

Fi me as may be illu ited as tollows Che correlatio t 

the scores of the 180 individuals by the two halves of 1 
i, cale has been « d by the Pearson formula and fou 
8 VHer Cases were considered. Calling the ca 

I Mmcividuais Case 1, than as Case ? were considered o1 
Individuals whe tal ages fell between 13 and 16 11, 

Case OTretation 
the correlation was MM 44. As Case 3 
i) riting t ove, one of the writers (Knollin) has test lar 
; the lates Stal ra Rev ih 
! 191 | from 1 ft testings the reliability of that revis 
7 deter | by the same method. The median d 
ibyt i em in ditierence between ment 
the 10 9 : 
in 109 MENS (Corresponding to 10.6 months in t 
) mentior ry It may be said. therefor: that the probable error of a ment 
= _— n is practically the same for adult< s that of the former ar 
4 taken also as one-half vear 


BINET AND PEDAGOGICAL SCALES 129 


only those individuals whose mental ages fell between 
14-11, in which case the correlation was —0.14. And as 
were considered only those individuals whose mental 
hetween 13 and 13-11, in which case the correlation was 


be seen that differences in the he terogeneity 

» make very great differences in the values of the coeffi 
orrelation between the scores, so great, in fact, as to 
ficient (0.85) of much of its significance. Doubtless 
tion would have been considerably higher than 0.85 
number of children of ages down to 3 or 4 had been 
in the group. The probable errors of the scores in the 
es were also determined and found to be respectively 


iths. 5.8 months, 5.9 months, and 6.0 months, showing 


erogeneity of the group, as such, probably had no serious 

on the value of the probable error of the score. These 
hree values of the probable error of a score are believed to be 
nearly correct than the value 5.3 for reasons stated above 

ther fact that in as small a group of individuals as 31 the 
ble error has practically the same value as that in the cases 
larger groups speaks well for the validity of that valu 
point to be noted in this connection is that the value of ; 
cient of correlation between two series of measures depend 


n two variables, first, the amount of difference between the 


nembers of each pair of x- and y-values (when expressed in thi 
terms) and second, upon the degree of heterogeneity of the 


group of individuals with regard to the magnitude being measured 
Now the reliability of a scale has nothing at all to do with the 

rogeneity of the group, except as we wish to consider the | 
bable error in relation to the heterogeneity. The reliability 
ild therefore be measured by values which are independent of 

heterogeneity. The probable error of a score fulfills this ; 


ndition and expresses the reliability in very significant terms; 

it tells us what limit of error may be expected in 50 percent of 

ases of measurement; and from it upon theoretical grounds the 

limit of error that may be expected in any other percent of cases 
be calculated. 

It should be said that in the special case in which it is desired 

tind the relative degrees of reliability of a number of tests which 


ive been given to the same group of individuals, a coefficient of 


mon th 
| 


were the same for both 


THe Use or a Direy 


tween the theoretic 


individu 


uals 


a value may be obtains 
i 


uals and their second « 


or negative. between the 


It may be valuable in 


EDU 


RENG 


Ip 


cient of correlation bh tween the first 
cores 


the same scale or a similar one 


ma consic red negligible 


irom 


ForM 


many case 


a me; 


Let us the simplest Case 


scores) are direct] comparable to the 


Variation 


of the B scores will be 


the purpose ol 


when 


RESI 


LA FOR 


[RCH 


ORREI 


to know the 


variability of the sey eral 
nd the variability of the s 


COMposing§ the 


tsure ol 


individuals, which corre sponds exactly to the 


SCOTES of the 


Ser ond 


scores 
res of the Seve! 
\ method for 
has been des« ribed by one of the writers Otis 


differences between pairs of scores by the same individua 


vrou 


scores 


method may he 


the tirst 


( ond SCOTES 


Pearson has given; lormula for correlation: 


the 


By this is me int that differences between the A an 
ue to practice effect or to differences in diffi ulty betwe 
Phen any differen: 

1 two scores of a single individua 


Under these circu 


Istrati 


corresponding measure of the variability of the scores of t} 


Pear 


\ score s, so that these \ irl 


UFrelavion between duplicate sets of sc Cs 1n the same 
ab, as a convenient measure of the degree of relia] ility of 1 
with which to compare a similarly derived measure of the 1 
f ity of another test his method of coy parison is valid 
. degree of heterogeneity of the group is the sam in all 
‘ coetiicients derived m two different groups of individ 
ever, would not mparable unless the d¢ grees of hetero 
| 
| 
The 
as { 
“ame as the variability of the 
may be considered equal for on 
’See Reference 4 
" See Reference 12 
} 


BINET AND PEDAGOGICAL SCALES 131 


in terms of our variables. A and B, this formula 


2a40R 
ssume the simplest case where a4 =o. we have 
Yar or 
Jo ‘ 2a RB 
ra ] or 1 


ficient of correlation between A and B is seen. therefore 

expression of a certain relation between the variability 

erences between A and B and the variability of the values 

B themselves. That is, the coefficient of correlation in 

with decrease in the ratio of these measures of variability 

wwever, the A and B scores are not directly comparable 

hecause of practice efiect or difierences in difficulty be 

iles A and B, then in order to find the probable error of 

the S( ale it he comes necess iry, as has been ( xplained 

to transmute the B values into terms of the A scale, or 
rsa, by means of the line of relation between A and B 

In ll cases of rectilinear relationship the most probable 

n of the true line of relation between two variables, x and 

he line which passes through the point in the plot represent 

e mean of the x values and the mean of the y values (assumed 

the origin of the plot) and which has a slope such that the 

nt of the angle it makes with the horizontal axis is equal 

e quotient obtained by dividing the standard deviation of the 

lues by the standard deviation of the x values. That is, the 

tion of the line which most probably expresses the true rela 

nship between x and y is 1 oy x Now, if we measure the 


} 
} 


deviations of the points in the plot from this line, 


x, and call any one of these deviations d, then each value 
the difference between a value of y and the corresponding 


he proof of this statement is quite involved. It is given by Otis in an article 


snot yet published 


| 
Ea 

u 
o | 
| 


OURNAL EDUCATIONAL RESEAR( H 


value of x after this has been transmuted into terms of} 


it may be shown that the coefficient of correlation is give; 
formula 


In the previous description of this formula" it was 

the “Deviation Formula.” It has since been considered 
ible however. to call it the Difference Chis forn 
identical with the Pearson product-moment formula. as is der 
strated in Appendix I] 


[In order to express this formula in terms of our \ 
and B, we must give a notation to the B scores when render 
terms ot the A scale by means of the line of relation whose 


B. Let us call these transmuted B values. B 


‘ 


litference formula then becomes 


Yan=l a) 


This formula, then expresses a certain relation betweer 


i 


variability of the difference between A and B values (whe: 


an values have been transmuted into terms of the A scale and t 
\ variability ol the A values Conversely, of course, if the A 
are converted into terms of the B scale, the difference lorn 
takes the form 
rss. 
Vas 1 ls ° B—AB (b) 
a [wo modifications of the above difference formula are 
1 (= 
M.D., 
1 and 
A. \? 
in which M. D.;4 Ba) and M. D.4 are the median deviations 
the distribution of (A B,) and A values, respectively, in terms 
of the A scale; and A. D. 1-B4) and A. D.4 are respectively th: 
; average deviations of these distributions. As has been shown 


the probable error of a score in terms of the A scale) is express 
by the equation: 


P.E., 


V2 


*See Reference 4 


Ue, 
(nd 
ao. 
I ly 
i] 
jar 
| it) 
1 
had 
my 


BINET AND PEDAGOGICAL SCALES 133 


wish to use the value of the probable error of a score 


rrelation formula, we may take 


P. ) 
Var 
M. 


(2 RB ) 
Tar | 


. vords, the reliability coeflicient of corre lation between 
M te scores of the individuals of a group is equal to 1 
square of the ratio of the probable error of a single 
ie median variability of the scores, when these values 
ime terms. The advantage of enabling a correlation 
tained directly from a value of the probable error is pecul 
difference formula for correlation described herein 
\r r corresponding measures of variability may be used, 
nterquartile ranges 
lo illustrate the use of form (c) of the difference formula in the 
instance, the median deviation of the distribution of B 
i tl if. D.y) was determined in the same way as the median 
of the distribution of values (A,-B), as shown in 
Figure 2. This value, M. D.,, was found to be approximately 
mnths. Our value of M. was 10.8. Therefore, 
the formula, 
M. 
20.1 
Tar &55 
his is the coefficient of correlation between scores by the two 
ves of the scale. The corresponding value of 7, found by the 
rson product-moment method, was 0. 850 
he error of a coefficient of correlation found by this modifica- 
S n of the difference formula will depend, of course, among other 
D ings, upon the care with which the medians involved are deter- 


mined, as shown in Figure 2. By the use of methods of approxi- 
mation it has been found possible to obtain coefficients by the 


difference method about as quickly as by the method ot unlike 


| 
| 
| 


JOURN 


igns, and the coethcients were beli 


By the LIS¢ ! tormula (a and (b 


mula gives in certain instances a 


; the Pearson formula, since it corr 


l C10 ul 


ic correlation ratio. 
Pearson formula, that too great wel 


partlh account tor slight diffe rence 
t} ods 


COMMON To Givi 


: probable error ot a mental age by i 


By means of these standards chil l 


results compared with 1 those which a 


grades in city systems in general. 


attainment of a small number of gri 
tested. With still less reliab ility the 


of a score is given 


in Reading’: “ | for exact me 
tor small differences amongst them, 


and questions of each degree of d 
such scales, Beta, Gamma, Delta. 

however, the present scale is the best 
reliability of a score is mentioned 


™ See Reference 11. p. 204 
Reference 2, p. 40 
* Reference 9, 1915, p. 458 


qual to that with the Pearson f 
relationship. When the relation is curvilinear the dij 


14 


may be compared with these results.” 


Thorndike s ays with regard to his “ 


EDUCATIONAL RESEARCH 


eved to be much n 

the reliability of coef 
ormula in cases of 
more accurate coeth 


ects the coefficient, j 


instances, for ske wness of the distributions i In somewhat 


Spearman's criti 


ight is given to extre) 


is obviated by the use of formulas (c) or (d These 


s between results 


MEASURES OF RELIA 


Improper methods of reliability constitut: 
a less prevalent fault on the part ot most scale mak« 
failure to give any measure whitnnes of the reliabilit 


scores To the knowledge of the 


rs 


writers no meas 
iny of the several 


the Binet Scale has been determined before The sams 
with regard to ne irly all of the pedagogical scales. 


Ayres says with regard to the reliabil lity of his spelling 


ren of the different 


any loca lity may be tested as to their spelling attainments 


re found in the corre 
With less reliabilit 

ides or of one grade 
attainment of a sing] 


Scale for Measuring A 
asures of individual ; 

a scale with more paragr 
ifficulty is required. | 


and so on, are constru: 


to use.’’ No measure 


No measure of the P 


} 


th 
riet 
is 


BINETI PEDAGOGICAL SCALES 135 
1} +} 
\ could be tound ol the provable erro score 
int of his grammar and arithmetic scales He 
ith regard to the grammar tes In ite o 
ions, Which fundamentally are not of ver eriou 
these tests provide measures ol grammalt Knowledge 
lite accurate and far more accurate than ordinary 
testing and marking 
lso says with regard to his reading test ‘It is recon 
hul 27 r} 1} i 
the vocabulary test on page of be given In conjuns 
e test for speed and comprehension These three 
ner will serve as a very ade juate measure ol a Up s 
ilitv. 
ips no better comment can be made on the general 
makers and users of scales with regard to the reliability 
626 
be not that no matter whether the g 1 r mar 
nd el re yund, the scaler ts ibl 
\ submitted to thre lege profe ker 
rking teen erent tests. ( sm l reg 
the pudding is in the eating 1 the proo the re t 
P. | i rT it least to the extent of our t t eP. ] 
Yt be considered reliable. It may b las 
rports to measur¢ en the P. I 
rough idea of the amount of the probable error of the Star ramma 
s A and B were submitted to 25 children of grades Iv to vi i ill 
papers were graded as accurately is possible irn a 
By comparing Scales A and B, the probable error of a s yund t 
t ly 1.27 steps, which, if accurate, would mean that in Ase 
error by an amount slightly greater than the difference between the abilitic 
for the seventh grade and for the junior class in high school, according t 
given by Starch. To be sure, no great reliance can be placed on our figure 
f they were from so few individuals; and it is greatly to be hop it Starcl 
e to show that they are too high. Until such tim the | my 
tests are far from being “quite accurate 
rhe Pearson coefficient of correlation between the scores | 
to be only 0.13, while the coefficient of correlatior tween tl mbers of 
tests passed was 0.47. (These values are rendered comparable since the 
ils were the same in both cases.) Here is evidence for the further presumption 
he very simple method of counting the individual tests passed gives a more 


e score than the method advocated by Starch No doubt the 


ld be made greater by the adoption of a better method of 


* Reference 7, p. 21 


reliability of the 


| 

| 

| 

t 

ule scoring 


JOURNAL EDUCATIONAL RESEARCH 


It of Starch’s book. We will quote the paragraph wit! 


change of substituting the word “scales” for the word 


We believe it will depict with Startling significance 


which we are fast approaching. The paragraph as thy 


reads as follows 


and momentous problems in the operation of a school cd 


them, such as promotion, retardation. elimination 
eligibility for contests and societies. graduation, admi 


ii> 


higher institutions. re« ommendation for future Position 


like. Until a decade ago, no one questioned either thy 


the fairness of these me asurements. It was tac itly 


scales were almost absolutely correct, or very nearly 


fractional part of a point, even on a hundred percentag 


[he necessity for a measure of the reliability of th 
| any test was discussed in 1912 by Otis and Davidson 
ously, no child’s score in any pedagogical test can be saf. 


Classification or ¢ ommitment to an institution. when ther: 


bein error. Every maker of a scale for mental testing, w] 
an account of the standardization of such scale. should t! 


| give the P. E. or some similar measure of the reliability of 


trom the results are cautioned to bear in mind that ever, 


the true measure of the ability in question. A mental ag 
teen years of the Binet Scale, when used with adults, is ir 


a mental age of fourteen years plus or minus six months 


measures, not necessarily the defic iency of the scale as a n 


may be greater 


* Reference 5. 


of the scores obtained by them than the first paragraph o} 


tested by the surprisingly common practice of marki; 


in half the cases. These measures of reliability are then 
subject to refinement, and at best they show only the am 


imperfection of a scale as a measure of that ability which it 


ales are the universal mi asures of school work. Ny 


as a basis for promotion or grading, nor a test of intelliger 


means of taking into account the amount by which the scor 


by the scale. Those who are using scales and drawing conc] 


but an approximation, often only a very rough approximati 


of the ability which it purports to measure. The latter defici 


— 
wae 
136 
sit ] 
rs er 
f 
Vail 
sur 
é - 
f 
ri 


BINET AND PEDAGOGICAL SCALES 137 


APPENDIX 
I. Proof of the Formula 
rsure) =V/ X (Median difference between measur 
ippose that we have m measures made upon a given magnitude 
te these measures by m, mz, ms, etc., and assume them to be 
i from the mean, M, of all the measures 


m, — =m", 

(om, = m*, —2m,m;+m' 

(m,—ms;)? =m*, —2m.m;+m 
etc etc 


ill the summation of the squares of all the differences between m’s, 


m —m Then 
n—1) (m?,+m?*.-4 + mn + mn) 
+m, 4 + Mn) 
etc 
first term of the right member is the sum of the n—1 differences | 
g m,, and the like number involving m., m mn. The re ' 
, g terms constitute the summation of the middle terms 2m mz, etc. 
Now to assist in simplifying this equation we will add (m?*, +m?*, 4 + mn) 
1 first term and then immediately subtract these values by making the 
second term — m,(m,+me+m;z+ +n), etc., as shown below. Then 
— +My + +n) 
€ my, m mn are measured from their mean, their sum 
Hence, in the second member all terms after the first vanish. By 
nition of standard deviation, 
m?, +m?*,+m'*, 4+ +m*n 
120m 
r m?, +m*, 4 + 
whence =(m —m)*== n(nT*m) 
[he first member of this equation expresses the sum of all the differences 
tween measures. Since there are m measures, there are as many differences i 
s there are combinations of m things taken two at a time, or > differ- 
es. Dividing by the number of differences, 
=(m —m)* — Tn 
n(n —1)/2 n—1 
Now the first member of this equation is the sum of the squares of a 
series of quantities, called (m —m), divided by the nurnber of such quantities. ' 
By definition, the square root of this quotient is the standard deviation of the 
juantities, which would be designated: @ (mm). That is, 


=(m—m)? 


n(n—1)/2 


4 
{ 
ion 


RESEARCH 


= Now since we have considered the measures as all having been 1 
‘ we! term the values of m, observed errors, and give om 
mn, Then, int notation, 


Now had an infinite number of cases, the value ol 
practi unity iT denoting the expression, when eq ials 
the subser pt a Te. tr the true value of the star 
a ol course ) iwavs equal to exactly Tin ” " x It has 


have 
tterences between Mcasures, SO we 
nee the distribution of errors and of differences bet weer 
Tt. pproxin tely 1 mal, we may assume that the median devi 
@ aistribution of errors which we may call the pr ha te error, is equa 
3 times the standard leviation of the distribution of errors. a ls 
the median deviation of he d stribution of differences (m m 
0.6745 mes the standar 1 deviation of the distribution of these d 


le measur: ¢, true =0.6745 Ge. 


ind M.D. (m—m) =0.6745 Gino 
Hence we may assume that since 

Oe tru loa 
Cherefore, P. E Single measure D. 


hat is, the probable error of a single measure equals the-1 


half times the median of the differences between the pairs of scores of 


individuals. In the case of this study the single measures are scores 
ages) by one or the other half of the Binet Scale and the differences | 


“COTES are lour aft 


er inequalities of difficulty between the two halves 
cale have been compensated for 


It may be of interest to note that, following from the above pr 
corollary, 


Te, observed = Or. true, OF Conversely, 


bserved bY which formula 


J 
6 ielt membe hn the equation a 
” - irom whic! 
n—1 
Om 
7m 
n—1 
p 
| 
n 
n—1 


BINEI 


ie of the true stands 


IVD PEDAGOGICAL SCALES 


ird error may be derived trom an 


r taken from anv given number of measures 
the Difference Formula and its Relatioy 
Product-Moment Formula 
‘ e text, for any values of ind 


ember is the Pearsor 
he difference formula 


rmal distribution 


deviation M dD. 


of the distribution 


dD ot 


on of the distribution 


rage deviation (4 


erquartile range (I.Q.R 

tion of the distribution 
cases of all distributions 
tles are approximately true 
M.D. 


1 
(M.D 


approximately 1 


\ 

2vmx + 

Sy? —2m Ux > 
+m? 
m* 

o o o 
a ” o oO 

a 
o o 
2a 
Ja 
o~ 


1 product-moment formula and 


he distribution =0.6745 times 
the distribution =0. 8453 


of the distribution 1.349 times 


which are approximately normal, 


Therefore, ‘ 
approximately 0.6745¢ 
approximately 0.67450? 


o 


times th 


observed 


vith the 
he righ 
standard 
si ind 
he stand 
the same 


$ 


4 
} 1 
139 
! 
Pearson 
For con 
m =the tangent of the angle of the line of relation 
7 
u 
/ 
vd 
2m 
| 
2n 
on 
; 
’ 
Tx 
\ 
y 
Or | 
| 
on 
) 
$ 


140 


JOURNAL EDUCATIONAL RESE | 


Similarly, 


1.D 
=approximately 1 $—, and 
1.D.)*y 
1.O.R.)%4 
1 approximately 1 2 
R 


It may be interesting to note the relation between the differ: 


and the Pearson formula 


ey in which 
[ft instead of measuring the deviations (d) of the points of the p 
line of relation, 4 r, these deviations are measured trom thi 


Or 


passes through the origin making an angle of 45° with the hori; 


equation of which is y=x, then m. the tangent of the angle of 
ind d=y—a Beginning with this equation, the derivation of 


formula is identical with the first eight steps of the derivation 
ence tormula given above w hen all m’s are omitted be ause in t} 


In a sense, therefore, this Pearson formula may be considered 
case of the more general difference formula given above 


Proof of the Theorem that thi Probable Error of a Score by th i 


V '% times the Probable Error of a Score hy Half the S 
We have seen trom Appendix II that the reliability coethcient 
PI 
tion between the duplicate scores of single individuals is 


Cr 


issuming x and y to bein the same terms) 
We have seen also, from Appendix I, that 
Ol x-y 


or 
in which ae is the standard deviation of true errors, that is. of d 
between the several measures and the « orresponding true measuré 


Vv 
rherefore r=] 


Ox 
Cr 

or 
Ox 


Now by Spearman’s theorem that 
Prsiq), x = 
q —G)rxiq), x(q 
when is one number of measures and q another (in this case p =2a 
2r 


'z(p), x(p) = 


= 

| { 

| 

v2 

d I 
l+r 


BINET AND PEDAGOGICAL SCALES 141 


e reliabilitv coefficient of the whole scale and r,; =the reliability 


o's f 


O-x 
half scale (3) 
(whole scale) (4) 


cripts 1 and 2 refer respectively to the half scale and whol 

the obtained score by the half scale has been multiplied by 2 

each score by the whole scale is the ave rage of the scores by the 


hat is, a : in which x; is the score by the first half of 
e score by the second half | 
} 
+2 + Ta 
ind @ 0 because the variabilities of the two half 
resumably the same, therefore 
2? O*x +O 
4 
1 
o 
1/9 
~ 


= | 

V 

ais \ since the distribution of errors is presumably normal, we may say 

P. E. (whole scale 0.6745¢ | 

P. E. (half scale 0.67450, 

> > (half erale 

P. E. (whole scale V i P. E. (hali al ' 

QO. E. D 

: 

a 

5 ( Correlation calculated fr taulty Br Jour ‘ 
1-95. October, 1°10 t 

r suggestions concerning this proof, the writers are indebted to Dr. Truman ' 


Simplifving 2a? o*x,(1 —r,) (6) 
=] L. A 
t 


[BLIOGRAPHY 


2h 
a 
: 
r? / 1:268-314. D r. 191] 
( tv. 1913. 116 np 
La $: 676, 716. 750. 703. 
] r 1391-105. October 101 
\ | m ire! nt of abilit | 
4 6:°615-026. D er. 1915 
} u Ibid., 7:213-222, April, 191 
rem New York: M 
| 191 
mer Yor rs College. ( 
‘ 79 
| 1913 277 pp 
» | \ ry 
ri, 10:445-467, 17:40-67, November. 19 
4 MAN, D Korrelation 
ni t Z r 14-5 
( 1919 
KA turther thods of termir rrelatior 
t t Theory of Evoluti: XVI. Dranpers 
Memoirs, Biometric Series, 


Editorials 


A HUMAN LADDER 


Che idea of the so-called human ladder is more or less familiar. 
In the army, captains in rating candidates for lieutenancies 
ked to do so under several headings. One of these was 
rship.” In order to rate candidates with respect to leader 
h captain was to set up his own scale. He was to evoke 
s experience the best, poorest, and average lieutenants 
nt of leadership. Two other lieutenants—one midway 
n the best and the average, the other midway between the 
ind the average—« ompleted a leadership ladder « onsisting 
inds. A candidate could therefore be given the highest 
f credits if the ¢ aptain judged his leadership to be equal 
best within his experience. He could be given the next 
ger number of credits if his leadership seemed to match that of 
ieutenant whom the rater had chosen to occupy the second 
tion in his ladder, and so on 
This device has a number of defects Because there are but 
inds, either the ladder must be short or the rounds must be 
If the particular captain’s experience has been meager, 
robably not have encountered either the very best or the 
poorest leadership. The spread of his scale (i. e., the length 
his ladder) will, therefore, be small. His ratings may be rela 
irate, but only within a narrow range of the train in 
Clearly such a captain’s ratings will not have the sam: 
gas those of a captain who has drawn upon a rich experi 
In tact, the five-round human ladder is defective because 
rson has a different one. It can only be thoroughly 
ory if the five standard lieutenants have been drawn 
body truly representative of all lieutenants, and if they 
ve been objectively selected for their positions on the scale or at 
ist selected upon the consensus of a large number of persons 
npetent to render judgment 
Suppose it were ») ssible to construct a ladder or scale for 
‘suring ability in a school subject which should have one 
lred steps instead of five, which should be the same for 


14 


= 

| 

‘4 

| 


evervb ina 


mines 


x 

Would 

sed there 
1 that 


iil 
ll 
cit 
mre ) 
In the 
} 
I 


which should consist of steps objectiy 


vely used. Suppose further that this 


wo or more of them were constructed. ea 


school subiect ratings according 
can the same and tx capable oI 


Would not su¢ h a devic be wort 


e table possesses the properties whi h 


hibits a series of typical scores. In 

be one hundred of them; but in the fo 

are only ten or twelve. For examp! 

the eighth-grade percentile scores for H 


est (1,691 pupils participating) run a 
%> 90 80 70 60 50 40 30 20 
S175 68 57 51 41 


his table will be simplified if wi 
‘ticipated were typical eig] 


posit Is not necessary, but if it is not 


ne group trom which the figures were 
be taken as typical, the condition 
ve figures may be described as 

lisposal one hundred eighth-gradk 


vould give thoroughly typical resp 


+ American History Test, and if we had thes: 


cording to the size of their score on 


led the pupil with the lowest score “N 


th the highest score “No. 100." we shou 


il N » had a score of 95, that Pupil No. 90 
nat Pu No. SO rad score otf 92. ete It is cl 
he rie res (95. O4 &] 71>. ef 
or rounds of a measuring ladder. and th 
) or stens namely. 95 UT) SO) 70 fi) 
is test riorms like Pupil 
pical eighth-grade children y 
) l¢ poorest \\ 
s | 60 rather than of 75. | 
nm test uals the scor 
er i we m hie 
i WC tila i ilmM 
St. OF course, pupils i r classes wil 
re re ot either 


| such that when 
fer e to 
ined mathem 
| 
been she 
American Histor 
rerceny l 
| ‘ re 
OS! war 
va 
Tie n 
<hibi 
‘ 
i 
LP 
\ 
rece 
! 


iy 

| 

EDITORIALS 145 

casionally, therefore, will he be given a derived scort 

If a child scores 76, we may interpolate to obtain his 

Manifestly it will be between 60 and 70. It is 

to place it one-sixth of the way from 60 towards 70 

ld vield 62 to the nearest whole number 

ider will note that a score of 68 on the test corresponds 
the fiftieth or middle pupil of the arrangement. The 

68 is, therefore, our old friend, the median. A pupil who 

his score may be given a derived score of 50 
It ms to us that the percentile method, converting, as it 


rude scores on tests into scores in terms of rank, supplies 

| need. We received a short time ago the following 

, superintendent who has been doing a large amount of 

! considerable thinking: ‘‘Would it not be possible 
consideration of all the scores which must be ailable 
the widely-used tests to publish more detailed stand- 

\ median is of value for certain purposes. By it a super 
t can measure the standing of different grades and of the 


grade in different buildings, but when he attempts to apply 


lishment tests as an aid to a reclassification program, 
lian does not give him a sufficient amount of information.” 
percentile table provides precisely what this superinten 
in mind, namely, “more detailed standards.” It gives the 


the 50-percentile) and it gives other standards corre 

ing to the tenth, twentieth, thirtieth, etc. typical pupils, 
nedian score corresponds to the fiftieth pupil. : 
\nother superintendent, who was about to initiate a testing 
ram with a view to reclassifying pupils, raised a question as 

w low a score should be in one test (the scores in other tests 
ing satisfactory) to justify a retest in the one subject in which 
score was low. We do not know any better way to get at thi 
tion than on the basis of percentiles. The question involves 


comparison of scores made on different tests, including 
bably a score in an intelligence test. We need some method 
| scoring which shall be the same for all the tests concerned, and 
percentile method has this advantage. The superintendent of 
om we have just been speaking might decide that any child 
equired retesting who had fallen as low as the thirtieth or fortieth 


pil in a representative list—in other words, as low as the 30 


+()-percentile—provided his scores in intelligence and reading ' 


| 
| 
. 


146 JOURNAL EDUCATIONAL RESE 1RCH 


were above the 60 percentile. We do not suggest thes 


bases. Any basis which seems reasonable mav by cl 
point is that the percentile method enables us to dk 


peak, a limiting discrepancy, and to say that certain 
tive action shall be taken when this limit is « xceeded 

Finally our percentile scores tor differs nt tests 
bined The score on a spelling test can be combin 


re on an arithmetic test although the original units 
different. Of course. we are aware that there is a se; 


x | the steps of a percentile scale are not equal. For exar 
be observed that in the above table the step from the 30) 

: : to the 40 percentile is Six units, but that the step 

percentil to the rcentile is eleven units. Che 


80-percentile to the 90 percentile is only two units. N 
sert that there is a very real sense in which thes 


equal So far as the data are representative of typical « 
they indicate that Pupil No. 40 is as much better than P 
30 as Pupil No. 50 is better than Pupil No. 40. The 


that it is just as much more difficult for eighth grad 
raise their score from 92 to 94 as it is to raise their scor 
to 57. This must be so, for otherwise, the next ten pup; 
< exceed the performance of Pupil No. 80 could onh 


distinction by obtaining larger scores. This goes pretty 
into the question of what difficulty is. We shall not enter int 
further than to say without argument that our definiti 
difficulty rests upon the proportion of persons who can | 
the act in question. With this definition of difficulty it is pert 
pantifs ible to argue that it is as hard. according to our per 
stribution, for an eighth-grade child to raise his scot 
to 68. This is 
a score of 94 is so much more difficult to attain than 
92 that approximately 10 percent mor pupils fail to get it. | 
wise, 10 percent more pupils fail to reach 68 than to reac! 
[t is our judgment, therefore, that a much greater us 
be made of percentile distributions than is now being made. V 
feel sure that this larger use of them would manifest itself. if t 
nature and practical utility were better understood. Wé 
mend to research workers that in their reporting they 
to a greater extent than they have heretofore this kind of hi 


ladder B. R 


pe to 94 as it is to raise his score trom 5 


B 


Reviews Abstracts 


E. H. Cameron, Fait 


lassroom products which are to be measured, the methods of givi 


G. M. ano Hoke, KREMER J Ho measur New York: Macmillan j 
1920 RS py 
£59 pp 
nterest in the problem of tests and measurements is reflected in a recent 
H lo Mi ’ The two controlling ideas of tl liscussions, as : 
ce, are: “first, that the work in measurement should be handled more 
ndividual classroom teacher; and second, that the chief purpose to 
dard tests is the diagnosis of pupil ability and pupil difficultic ; 
rpose of tests is fundamental and cannot be over-emphasized 
s in a book on measurement can be organ fror t least two 
the important measurable phases of subject can be analyzed 
le h are lable can be descril I ithors have che : 
1 | ed uwnosis only in so f t n be carried or : 
tar 1tests. In this connection they do not attempt a critical § 
ll tests. On the other hand, they discuss only those tests which on 
r use, purpose, and adaptability have been found to be most serviceab! 
teacher 
of the book is doubtless best reflected in chapter m entitled ‘The 
nt of Handwriting.”” In this chapter the authors have discussed in det : 


ng results, standards of attainment, and remedial instruction. If the 
. been followed with equal thoroughness in all subjects, the book would 
; tinct contribution In its present form, it does not excel some of t! 
id measurements in practical valu 
mpresses the reader both favorably and unfavorabl Phis impre 1 i 
1 most effectively by certain contrasts .) The phases of handwriting 
measured are discussed in detail; similar discussions of other subject 
t limited number of chapters b) Certain chapters contain the lat 
n concerning tests in a given subject; other chapters omit many recent di 
ts, giving the impression that they were written some time ago The : 
| for arithmetic is well organized, fairly complete, and very suggestive : 
raphies for certain subjects are incomplete and poorly organized d) The : 
neral intelligence tests is emphasized; recent investigations concerning t : 
, r n of age and grade standing to accomplishment in school subjects are not dis i 
rly and pointedly 
t \ very commendable feature of the book merits special mentior It has been } 
n simple, clear English, which greatly increases its value to the classroom ' 


W.S. GRAY 


SEP 


. aa 
4 
| | 
| 
| 
| 
i 
147 


Relation of Gener 


n 


issad lable 
of education \ gre 
vocational educatior 


ussions, an 

IS against the 

tively estal 

ite educat on, ind especially the 
specific forms of education 

en has done i re il service in point 


most part, base 


Ipon a misapprehen ion concerning the ages of the 
ation designed 


itional education 


In the same chapter, there is a 1 


ocational work and 


7 
| 14 CRNAL EDUCATIONAL RESEARCH 
RAPEER, L. W , irur ht New York Charl 
1920. 545 py 
volume ver mplete discussion of the history dev: 
z r t I lated rural school. The volume j based on rather 
‘ t ind or «ial t ry of the function of the rural publ 
ate I reneral aim held thet ai off 
uum held is t social ¢ 
ds. 
a t tter ler I be groupe 1 the prir pal needs of th 
be r | pr ot il vi nh they ) ire analyzed is, (1 
ef I € are t fundamental goals each chapter and are treat 
program of studie lhe various topics relat ‘ 
| " ind cce workers in t} 
ing the orga t ind cooperation of rural people in meet 
proble t the a t msolidated pub! hool 
1 lated described. Curricula for cor 
ir t uj I tivantage nd disad\y intages of the nsolidat 
| ve | graphy on the subject listed at t 
: thor seems to have gathered together a ut all that ha 
3 t vit robiem ol rural education. J bool ould be a 
By der 
\ 
’ 
DEN, Da jucation. ( Brief Course Series New York: M 
Company, 1920 SO) np 
Dr Ther en so man mpilations an econd-hand treat 
“i education that it is refreshing to read a straightforward. original. aut 
} noft bject. Such a discussion is David Snedden’s recent 
i may not agree with so of the conclusions, but after he has read th 
be no question in his mind as to what Dr Snedden thinks 
r The fir t three pters The Me ining of Vo« itional Educ ition 
Need of Better Vocational Education and “The \ 
2 Education” are a clear. unequiy «al, and seeming 
i it } il l pl e ol itional work ir our scher t 
; the dithculties that e impeded the progress of 
m ipprenension lack cl rity ar met 
ir jest the vocational enthusiast / 
ition first three t 
nity of purpose in all forms of leg nm 
Character of the illed gene ral 
In chapter thirteen, Dr. Sn r out 
severest critics of vo their fea 
h 
their criticisms pus | 
vocational ring 
those who imagine a conflict between VIE “education for det 
ad 
4 


od 
REVIEWS AND ABSTRACTS 14° 
t I 4 ipter thi OOK present m il I iut 
tal} livis f the f ipational wor 
It iseful bool It ib] i text 
| or industria! education 
S. J. Vaucr 
\\ nd » Laer New Yor t 
the Gr t War li be good lastiz ke ret 
sane direction to the reconstruction of educat ‘ 
r ectability not consistent th their pre t ‘ \ rigid ar 
praisal of college and universit I ( For mat ears tl 
tio! ll ot pern it an th gy ot I th t 
es of war are the new and mor ermanent emergencies f 
t tion calls upon the colleges be t re re lled inte 
t d lavs upon them a n nd f 
te task of this book to define this ne ligation of our An il 
\mer 1 to the rid, not only through their valuable tributic 
t tl gh their evervday education of Americal 1 
trusts the academic n imes ridicules it, ofte ture 
or the sterner pur] life. Wide-reacl riticisms of it 
lit hich is almost disconcert to the ed tor! el From 
inv of these he is defended by a careful analy t nature 
the pec ir demands of his worl Tr} rch for trut yr it 
olt prot or’s first oblig We call it | i 
r i es of life he has his everyday obligations B ) 
tion to the social order it whole, the obligation t ( 
t t to solve the more pressing concret« probl ft 
rs to sol Science has larger re t 
Without recognition of this obligation to ty the professor a 
ome at least unmoral. With respect to t bligation to the s 
i rather general failure of the demi \f ti rs recog 
t jority do not. When thi t dor I rid ol t 
) as an academic mind, hindered by abstract vhos racter t 
rgotten—the defect which is at the bottom of nearly all the ty 
of college education today College protessors ¢ lucate b thods t ' 
endent upon their habits of abstraction in investigatio Cher 
ynscious itional ideals for the student, save ar r adn 
hould be the chief educational motive I lles rh n 
education is to produce a definite American social er, in relation to a 
rid-order.’” Education should be a training of the rational will rather than ; 
ve reason “Subjects abandoned at graduation are an unqualified co! 
tion either of the worth of these subjects for the education of the given student : 
; 


EDUCATIONAL RESEAR( H 


icademi 


berate practi 


tory and extent 


h 


ne J Inior High Sc! 


is discussed in t 
bjects 
promotion by subject, methods 
cation, and the housing and equipn 
rial shows that the pract 
s frst, from the altogether too comn 


ol 
on 


study Othe 


f illustrative mate 


It 


| 
‘ 
eu 
in taught | ‘ re gniticant 
ident the { ire of his colleg The colleg 
itt stud law be mes to the 
R . ne means to attain then Correlation of st 
ition courses in the nior and senior vear 
| i if 
eT Ay ca and the relation of t} 
r mpting and even a partial] 
~onsipilities > Lhe \merican r 
| ju things ma ne must answer that + 
¢ professor elf But a significant fact ol trude ] 
r obligations to the social order 
ttained such a consciousness must spread 
Mot spread the cont t 
n ining tor the Ph.D. degree does not 
‘ eacher Ju f it the best thing for t 
re f nr 
DOS! inalvzed and maintair in 
J rom t one } hy +} 
is they ar pported an the 
refor y hye re il It i iv ry wh 
na administrators, for it reveals the j; 
| ner Cd (ideals and points th 
nd deli 
R. D. ¢ 
i \ 
New York: Harcourt. Bra 
i) 17° 
( tee of Ten in 1893 to the py tt 
| representative ed tional bod 
| rar t of the public hool sten Th t 
' ‘ t one featur ind then another in their attempt to pictur 
ind purr uM the intermediate or junior hig} hool 
om a broad and comprehensive point of view 
e r Functions of 
ref f pupil mizing tim 
ase izing time; in recognizing individu 
| vr guidan providing the beginnings of vocatio; | 
. , it a scence; in providing the conditior 
ing | 1 securing etter holarship nd in improving the dic wt. . 
and socializing opportunitic I “in 
types, variable and constant ek 
reorganization are departmental 
system, the staff, the sox ial orga 
K.oos’s xtensive collection 
ng Inior high s¢ hools resul 


IVD COMMUNICATIONS 151 


t edu t d ise Strative 
t ra, bec Se 
t ) tie t r 
r 
ont It 
‘ 
rt 
t ter eds o t th P f 
trolled ‘ t t It 
1 lled ive ‘ 
| r yur 
i i ‘ 
line dil l 
r } | e next 


| News Items and Communications 


rtment will contain news items regarding t 
It will also serve a ‘ ring ho fe 
| tor on milar tonics eler ly of not mor han five 
These communications will be printed over th x ire 
nee g r’ tto 
Univer Hiinois, Urb Uline 
Dr \ ‘ er r € iluat the ¢ em oO 
4 Study of County State sc! is bee ed to the ce t 
cho 1 Svstems of stems in the state of Oregon \ Professor I L Stetson and 5 
Oregor Professor C. Ah she rhe 


Intelligence Tests of the us the National I ence T¢ eA ] 1 ' 
Determine Pro- Wavne County. Michigan. This test on th the ren 
ton it rural ol who ip} ed f the < t exa t 
Che report states that the ef object of the test to e to 
rat ers an additional che n the I 
tests is ne s reporti ‘ As et 
the writer but he inted it ther t te ent 
© 5 ir ses of intell test In t t | 
tv examinations were replac« oro ) er ence 
il tests 
The Edu , \ i nN ik Ma 1921 tains a pre 4 
Silent Reading liminary report on education ests vive nW ol , 
Ability of Ninth- direction o the st ite cle partr ent of education « ring the ear 
Grade Pupils 1920-1921 The conditions revealed in the case of silent reading ' 
are probably typical of conditions in other states. The following g 
juoted from the report because of the significance of the conditions 


VEWS ITEMS 
P. E. BELTING 
| 
| 


EDUCATIONAL RES} {RCH 


ooperative 
festing in 


Michigan 


100 \ 


1 tittle 


Che Practical Utility of the National Intelligence Tests 
r received a letter from \J orne Williams, the: 
nta l ] 


1 regard to plans for test 


ecale ihe statement ire ised upon score obtained from the 
tandar 1 Silene I orr vith 
‘ é Rea g Test IT, Form 1 wit lirst-year high 
Of far nm rag 
n » come up to the sixth grad, 
1 I ret w that in the average village < 
i t xth grade level in their nlity to 
‘ nN cities 27 5°, of the fr men are 
Vary Irom zero to 95 of the cl 
th er rs ling al 
| ‘ erv unsatisfacter In either 
rit rig te chool ers to teach cla 
‘he bureau of Tests and Measurements of th, Uy 
i i ed to learn the 1 relerences ¢ 
ent th ret 
/ reterence to ins for cooperative testing 
t rta the lestions asked are of eneral interest 
| | © prelerences of practical sc} olmen relati to t 
| ioted fro unetins issued by G. M. Whinn 
| ¢ for 50 to MMMichigan towns to carry out 
testing iT ou t 
i 
\rithmet R 
ins d) Writing 
' school systems, replied to the 
provra labulation of the replies r 
mn an inte rence test 
iu i i } Vill n pr vr 
c rv re ot the hool grad 
ore 00' subr ~ ind 
' ! ur or more, though sometin these four ir t 
) ‘ Court \rithmetic Test 
19 to use the Monroe Silent Reading Test 
1Y to \yres Handwriting S 
iY ltou National int rel Pest. 
Lli ft 
ear the 4 
) 


VEWS ITEMS AND COMMUNICATIONS 153 


eneral interest 
r is to eva ite n a practical ay, the comparative Seluiness ol 
i easures ot general abilit ol first-year pupils Four estions 
Hlow do the two scales c mare as regards the st of the material 
i e as regards convenience ll ger 3) How do the ( pare 
é rt scoru 4) What is the comparative a ra the 
la lications of ability? These fe lest ill be take ) 
re as? ra j It ‘ 
the contrast is striking—almost absurd. Scale B (the for ised) of the 


1.60 per 25 blanks. 1 does not include 


per 2 lan incl 
rections, however, which is priced at $25 Nor does it include ' 
irves Comparisons are most convenient! made in terms of cost ’ 
he cost of blanks for 100 pupils is then seen to be $6.40. If we sup : 
than one manual per 100 pupils needed (or one nual for each three 
s average 33 pupils in size) and if we suppose the shipping cost on 
r 1000 odd sheets) to be no more than 35 cents—certainly a reasonab! 


e a total cost of $7.00 per hundred pupils 
t the Cross-out Scale is sold at a flat rate of $1 00 per hundred, or 
e. Further, with each hundred blanks are included four of the combine: 
eet, score sheet, and record blank, with table of norms In order to 


plete outht nothing need be added to this dollar rate except shippin 
i flat 15 cents per 100 blanks if the goods are sent pare e] post, and is 
they are sent by express Lor plete mate ils for 100 pupils cost 
ntrast—$1.15 as compared with $7 .00 s surely strikin 
mpare as ret rd nh nm 
t is again striking. The pupil’s blanks of the National Intelligence Test 
pages, each 8X11. In contrast to this “booklet” (as it is called in the 
t put the ¢ ross-out ] lank, con isting of tour pages each 6 X 9 inche 
\ Group S« Inte ‘ Use ‘ ’ 
89-1 I 
e reporte Mr. W t 
t present fler ir i eak he t ‘ re 
th t th cre r the «at \f- \\ 


e it im rl 
t r tor 101 r 
r cs Kt 184 ite ‘ 
ces e, at tha 50K ‘ 
Cros re rec « re t ) 
re 1a 20 percent o for ch ce 
edt e ( ere ‘ 
the Nat Con ee booklet f j 
ry score: 90 s re for 


J 
l . The nal result of the correspondence \ ich folk ed is a 
the riter’ Cross-out Scale ind the Nat nal Intelliger ( 
s perhar arth has heth Form A and Form B ar ad — 
| 


ILLZIONAL RESEARCH 


rdet 


isionall 


a 
154 JOURNAL ED 
E I trast thr ird toa Ol iterials is quite a reat 
ect out Scale ir ides dire tior for 
= eal 7 ) i r r 
: National Intelligence Tests onsists of 
\—atr ot Test 5. No record sheet is supplied 
ted ver riefl Che entire direction< for scor ‘ 
14 d ring the National Int 
i ire sO] No tl directio ire re lire 
t t ( it Sca the « nation 1s so thoroughly syst. it 
| te ence lest ive ial dire tior lor ea test 
a rhe rea for the ul directions appears when the si arate test 
reluuyv study The pr in €acn test of the Cross-out ocale 1S s 
she mat licat the ar er differs fre test to test: thu 
t re er n ara imerals, tests 2 and + re juire t 
r the riting of “D” or “S”’ betweer two tern Form A is even | 
I ers to the tirst test are arabic numerals, the answers to the « 
. rd ritten by the iren; the third test required underlining of 
rds, the fourth test « r writing “D” or “S” between te 
lires the writir { 120 numbers Scoring cannot but require mar 
the examinat t s loosely organized In fact, there is one test 
' re res a tota ve special pre ns in OM to defin th 
\nd, in spite of all t elaborate definition, it is occdllly found nec 
rtain special features to the scorer idgment 
oO Much t ird to the obtaining of the crude score Wit t 
Scale thing more necessary than to co ne the crude scores for t 
n order to obtain t ul total score; and this total score may be obta 
recopying any of the r n the individual tests In using the Nation [r 
lests two { Irther pr 3s ire necessar after ‘the score on each test 
i l e copying of the crude scores m to the first page and (b) the weight 


VEWS ITEMS AND COMMUNICATIONS 15 


ocess of copying across is a minor task; but, in handling a large } 
t nts up, and it can be avoided by a little ingenuity, in many 


eighting requires time and trouble, involves further opportunity 
ring and, where both “rights” and “‘wrongs” must be considered, 
bor in the obtaining of the crude score. Three tests in Scale B 
Scale A call for further rehandling of the crude score. In three of the 


er of rights is multiplied by 2; in three, the final score is “rights 


mparat mer ir led 
‘\ ymes the surprising part ol the comparison Both scales were 
123 boys in the fresh in class of the Atlanta Technical High School 
eneral abilit using rm ng scale modelled after the o ers rating f 
the arn were also obtained from the teachers. In many instances 
felt that thev did not know the pupils well enough to rate them Finally | 
f the one teacher who knew each section best was chosen. A single rating i | 
es not give as reliable an indication of ability as ight be | 
t will serve as a rough criterion for comparative purposes. The correla : | 
ratings and score on the Cross-out Scale was found to be 0. 40 The Py | 
een ratings and National Intelligence Tests, Form B, was only 0.28 | 
Nat il Intelligence Tests cost nearly seven times as much as the Cross-out ; 
ite are over nine times as bulky as the Cross-out materials. The ; 
ligence Tests require some four or five times as much time to scor And 
es appear about equal, as measures of general ability. The contrast is surely 
The writer does not wish to press the point; and he hopes very much that 
¢ interested to make similar comparisons of other tests. The above corr: : 
be considered of rough suggestive value But suggestive they surely 


rest (it seems to the writer) that there is at present a general lack of 


ong test builders, of practical requirements in the way of expense an 


‘ a tendency to sacrifice such prac tical requirements t onsiderations ot 
t brief paper to suggest that 


e, and accuracy of measurement, may not be so incompatibl 


t technique. It is the purpose of the presen 
e after all 
tical considerations are not more taken account of there is danger that 


vill come to be looked upon by the superinte! dent as a luxury, and by the 


rasa burden. Instead, tests and.scales should become an indispensible con 
in school work. 


niversily 


The Use of Mental Tests in the Whitman School 


en | vent to the Whitn an School as prin ipal two vear wo, the tea hers 
I 
een there a year or more gave as the reason for the poor quality of the work, 


ldren. Between January and June, 1920, the teachers ; 
king under the instruction and direction of Professor George W. Frasier of 


rh 


mentality of the chi 


“tate Normal School, Cheney, Washington, gave the Stanford Revision of the 
et Test to 126 children, about one-third of the school. Every member of the 
\ class was tested. The teachers selected the rest from those whom they con 


e best and the poorest in their classes 


| 
. 


| 
| 
the num 
nd | 
—— 


MOU RNAL EDUCATIONA] RESEARCH 


ogical age and ‘ hild tested 
tly graded according I ) ogical age 45 


ercent are accelerated 


ABLE I \GE AND GRADI DISTRIBUTION 


pupils distributed according to mental age 
we takes pla e Whereas in Tal le I we had 45 2 per 


% percent. In like manner we note 3314 percent 


and 54.8 percent are accelerated. This points out that, thou 


ity as one having a large percent of retardation, we have ir 
leration, when mental. age instead of chronological 


edian I. Q, of the 126 children is &2 


lable | ws chron [ 
retarded; less than 9 
2 School Grade 
I I] ITI I\ \ \l VII VII 
6 to 8-6 5 2? 
8-6 to 0-6 3 5 ll 
‘ 1-6 to 11-6 4 3 
11-6 to 12-6 4 4 
12-6 to 13-6 2 
13-6 to 14-6 I 2 2 S 
14-6 to 15-6 l If) 
to 16-6 
ta 1 ] 
lotals 35 10 ?1 190 9 5 6 21 l 
fable IT shows g 
\ very noticeable cha t 
tarded, here we have r 
erly placed, _ 
iS rated int act 
percent of acs is UM 
ha The n 


VEWS ITEMS AND COMMUNICA TIONS 


AGE AND GRADI 


~ 


lhe first, fourth, and eighth grades presented the most diffi ult problems 
me those in I-B whose mental age was less than five years, and we stru 


10 


126 


Wesent 


ggled along 


\ course of study suited to the ability of the class, had to be arranged 


Eighth-grade pupils \ 


‘th mental age less than 13.6 and I. Q 


| | 
MENTAL AND DISTRIBI PION 
School Grade | 
I Il IV \ VI Vil VIII | Total 
13 13 
! 
9 2 11 
to 1 | 8 1s 
to 9-4 1 12 14 | 28 
to 10-6 1 3 4 ? | 
11 1 | 
to 11-6 3 
to 12+ 1 3 3 6 13 
6 l 1 2 
it) 
. to 15-6 1 6 7 
to 10-6 ? 2 | 
to 17-6 1 1 
lotals 35 10 21 19 9 5 6 21 || | 
th the others 
he fourth grade. 


158 /OURNAL EDUCATION RESEARCH | 
ler SS ll SOO ] ut, for neither the rade scho nor the 
their ‘ 
The t lete. s 1 t it the teachers ha 
est the entalityv of the Idren that inv so-called r 
aren a t >) that, alt h the best out of 376 
e bye erior (I. 110 or above that the 
( ( ed ¢ rh il ind a rec] t 
I tik if ike clear the reasons for the < 
‘ ith dist; where conditions are hetter 
ed 
\ 1S ¢ MEA 7? 5 \ hor | 
ter She pent t n the « 
t ver t ictorv mark We felt it u e] ¢ 
( t Ce might get met ig irom 
st nox nd senr cdi tor the 
She able to do hig] school work and w ll drop out 
Age 16.9. M.A. 11.11. 1.0. 74.4 is very ner 
: tr ttributes her poor school work to the fact that she 
ry hard nd even spends two or three hours w th her} 
} arithmetical problems that require iny re 
read Placed ( ling to M.A. she wi 
I! tal r rial work jr high school which she wil probably he } 
l i VOTK 
R. M \ge 13.8. M.A.9.3 67.6 1Vv-a 
M.M. Age 84. M.A.9.2. 110 
. R. and M. are brother ind sister. R. is feebleminded, M 
¢ in mt-A. The parents are divorced Che father 
trom ar parently normal famil\ lcoholic and degenerat. Hy 
bie to support his family. The mother appears normal. Ris un hal 
ntruthtu I recently \ taken into cust lv by the Juvenile ( 
t outside of school. M's « hool work is satisfactory and her cor 


L.H. Age 14.8 MA. 91 1.Q. 61.9. 
M.H. Age 166. 11.6. 71.8 
L. and M.ar ter r home conditions are wretched. We « tM 


ul t ‘ leit to got vork as a house naid. She | 
tions but is unable to make good L's mental age warrants placing he 
hould enter vi-p next sem: ter 
P.G. Age 12. M.A. 14 8 1.Q. 122.2. vir-a This is the highest 
l. P’s school work s Superior, and the teachers ri ognize his abilit I 
imbecile sister who h never been in any school 
In order to enable thes children of inferior mentality to do worl 
I shall organize t opportunity rooms in which we shall not attempt t 
regular course of st d Most of the € children can learn to write and re 
juire the fundamentals of arithmetic, but they require more time t 
cle ted to then in the rez lar classroom 


his plan, while undoul tedly crude, is a step in the right direct 
probably lead to a more exte nsive reorganization with a more definite vo ut 
of the upper grades 
Frances Wersms 


Prim tpal of the Whitman Scho, 


Spokane, Washington 


National Association of Directors of 
Educational Research 


(E. J. Asnpaucn, Secretary and Editor) 


UNIVERSITY ESTABLISHES A BUREAU OF EDUCATIONAI 
RESEARCH 
1915 authorized the establishment at the State University of a 


ency tests and survey. No funds were granted at the time since it 


to await the recommen of the university administrators 


nce that time the matter has been given much consideration, 
eneve to be ripe tor the inaugurati ot the wor 
f ne of the me bureau a careful nva Vas! 
tr } nine ner nf 
etr g and experience oO set »y make 


Dr. Buckingham. During the two years he wv president of the 
icted its affairs with wisdom and energy He is still a member of 


mitte ind no member evinces i greater interest in it welfare 


Dr. Buckingham’s most signal achievement was the launching of the 
re m r lit Phe ess which it has gained, the great prac 
t has already been to the scho n of the country, has been large 
torial ability and effort 
[ am sure I do for the entire membership of our association, we cot 
0 State University on securing Dr. Buckingham to head its new burea 
te the school people of Ohio on the fact that a potent force in the solutior 


s has been placed at their disposal: ind we extend to Dr. Buckingham 


ns upon his new honor and opportunity and our best wishes for | 1 
ew field 
re glad also to announce another promotion among our membership. Dr 
\\ for the | t { r years professor f ed t n the University « 


been brought to the University of Michigan Bureau of Mental Test 


Mi rements as the director of the Bureau of Mental Tests and Measurement 


W ngratulate Michigan upon adding Dr. Woody to its corps; and we are very 
to have Dr. Woody back where we may hope to see him at our meetings 


H. W. Anderson, Assistant Director, Educational Research, Detroit announces 
tablishment of tentative norms on the standardized test in typewriting upon 


h he has been working for the past two years. The tests are thoroughly practi 


159 


| 

| 

| 

| 

ty 

for the position. Finally Dr. B. R. Buckingham, Director of the Bureau | 

Research at University of Illinois, was chose He has accepted tl] 

t. and will assume his new duties September 1 : 

. rs of our Assoc tion will be delighted at t} news of the signal honor 

their pr 

& 

nd prot y his counsel. 


, and vic ld clear cut value Those 


well to inve tigate this test 


ucational Resear h, Colorado Stats Norr 
Group Test of Gs neral Ability The 


ng practice exercises whilk 
two con 
story arrangement. Part four consis 
nd classification 
lor children in grades 1 to m1 in: 
l or to write rhe situations are 
rhe story arrangement is probabl 


of any kind ar availabk 
lardization and ilso constructi 


of the State University of low 


in honorar 
ed On a commission to ad\y 


a national system of 


be back at the university 


partment editor er immer teachi 


oy 
160 JOURN 
RESEA Ri 
viministered and scored 
> interest 
ed Dr. H. T. Manuel Director Ed 
f four parts, the first 
sal 
tion, logical relatio; 
rhe test lesigt 
the child either to re 
the entire set No stat inique tem ip 
coope ion in stan 
€ criticism of the tees 
4 
| 
Dean W. F. Russell 
| tion, was appointed 
\ t. H Ol SEES schools and sailed for the o<.... = 
q 
Whos 
Wha ire you doing now at the beginning of the 
in i ‘ i \e iT 
| 


